== Low level issues can have quite odd high level symptoms (again) Let's start with [[my tweet from yesterday https://twitter.com/thatcks/status/691420494076211200]]: > So the recently released Fedora 22 [[libreswan https://libreswan.org]] > update appears to have broken IPSec tunnels for me. Good > going. Debugging this will be hell. This was quite consistent: if I installed the latest Fedora 22 update to libreswan, [[my IPSec based point to point tunnel ../linux/IKEForPointToPointGRE]] stopped working. More specifically, my home end (running Fedora 22) could not do an IKE negotiation with my office machine. If I reverted back to the older libreswan version, everything worked. This is exactly the sort of thing that is hell to debug and hell to get as a bug report. (Fedora's libreswan update jumped several point versions, from 3.13 to 3.16. There could be a lot of changes in there.) Today I put some time into trying to narrow down the symptoms and what the new libreswan was doing. It was an odd problem, because _tcpdump_ was claiming that the initial [[ISAKMP https://en.wikipedia.org/wiki/Internet_Security_Association_and_Key_Management_Protocol]] packets were going out from my home machine, but I didn't see them on my office machine or even on our exterior firewall. Given [[prior experiences ../linux/IKEShuttingDownConnection]] I suspected that the new version of libreswan was setting up IPSec security associations that were blocking traffic and making _tcpdump_ mislead me about whether packets were really getting out. But I couldn't see any sign of errant {{AB:SPD:Security Policy Database}} entries and using _tcpdump_ at the {{AB:PPPoE:PPP Over Ethernet, used for DSL connections}} level suggested very strongly that my ISAKMP packets really were being transmitted. But at the same time I could flip back and forth between libreswan versions, with one working and the other not. So in the end I did the obvious thing: I grabbed tcpdump output from a working session and a non-working session and started staring at them to see if anything looked different. Reading the packet dumps, my eyes settled on this (non-working first, then working): .pn prewrap on > PPPoE [ses 0xdf7] IP (tos 0x0, ttl 64, id 10253, offset 0, flags [DF], proto UDP (17), length 1464) > X.X.X.X.isakmp > Y.Y.Y.Y.isakmp: isakmp 2.0 msgid 00000000: parent_sa ikev2_init[I]: > [...] > > PPPoE [ses 0xdf7] IP (tos 0x0, ttl 64, id 32119, offset 0, flags [DF], proto UDP (17), length 1168) > X.X.X.X.isakmp > Y.Y.Y.Y.isakmp: isakmp 2.0 msgid 00000000: parent_sa ikev2_init[I]: > [...] I noticed that the packet length was different. The working packet was significantly shorter and the non-working one was not too far from the 1492 byte MTU of the PPP link itself. A little light turned on in my head, and some quick tests with _ping_ later I had my answer: ~~my PPPoE PPP MTU was too high~~, and as a result something in the path between me and the outside world was dropping any too-big packets that my machine generated. (It's probably the DSL modem and DSL hop, based on some tests with traceroute.) The reason things broke with the newer libreswan was that the newer version added several more cipher choices, which pushed the size of the initial ISAKMP packet over the actual working MTU. With the {{AB:DF:Don't Fragment}} bit set in the UDP packet, there was basically no chance of the packet getting fragmented when it hit wherever the block was; instead it was just summarily dropped. (I think I never saw issues with TCP connections because I'd long ago set a PPPoE option to clamp the MSS to 1412 bytes. So only UDP traffic would be affected, and of course I don't really do anything that generates large UDP packets. On the other hand, maybe this was a factor in [[an earlier mysterious network problem ../tech/IPv6ComplicationsAgain]], which I eventually made go away by disabling SPDY in Firefox.) What this illustrates for me, [[once again NetworkLoopsAreWeird]], is that I simply can't predict what the high level symptoms are going to be for a low level network problem. Or, more usefully, given a high level problem I can't even be sure if it's actually due to some low level network issue or if it has a high level cause of its own (like 'code changes between 3.13 and 3.16'). === Sidebar: what happens with my office machine's ISAKMP packets My office machine is running libreswan 3.16 too, so I immediately wondered if its initial ISAKMP packets were also getting dropped because of this (which would mean that my IPSec tunnel would only come up when my home machine initiated it). Looking into this revealed something weird: while my office machine is sending out large UDP ISAKMP packets with the DF bit set, something is stripping DF off and then fragmenting those UDP packets before they get to my home machine. Based on some experimentation, the largest inbound UDP packet I can receive un-fragmented is 1436 bytes. The DF bit gets stripped regardless of the packet size. (I suspect that my ISP's DSL PPP touchdown points are doing this. It's an obvious thing to do, really. Interesting, the 1436 byte size restriction is smaller than the outbound MTU I can use.)