2011-11-07
An IPSec mystery with dropped packets
I'm going to break one of my normal rules in this entry. To put it one way, I normally write about answers, not questions, but this time I have a mystery instead of a solution. In part I'm writing this entry for myself, so that I have everything written down in one spot for later reference.
My home machine has a long-standing GRE over IPSec tunnel to my work machine; this lets it masquerade as a machine on the local internal network. Since I migrated my home machine to Fedora 15, I've been experiencing relatively frequent problems with establishing at least new SSH connections over the link (recent evidence suggests that it may be new TCP connections in general). When the problem is happening, new SSH connections will get partway through the initial SSH protocol negotiations and then stall. At the same time the problem is not constant; sometimes new SSH connections work fine.
(In fact it looks like any TCP connection that hits the magic circumstances will also stall in a similar way.)
The problem is not a general stall of either network traffic or VPN traffic; during a stall, both external connections and existing VPN connections continue to work without problems. The problem is definitely happening at the IPSec/GRE level; I have captured tcpdump traces of both the GRE tunnel and the underlying DSL PPP link, and I can see TCP packets being transmitted on the GRE tunnel but not being transmitted on the DSL PPP link. And at the same time, ping packets are going through fine. A typical dropped packet is:
IP 128.100.3.52.47123 > 128.100.3.51.ssh: Flags [.], seq 22:522, ack 806, win 103, options [nop,nop,TS val 140991656 ecr 201734236], length 500
In fact I have tcpdump traces from both sides of the GRE tunnel that show that the initial version of this packet is dropped even in a stream of other packets that pass through fine. (And it is not an MTU issue; the link passes larger packets, among other signs.)
So far every trace I've seen for the problem has been TCP packets with a
reported data length of 500 octets; for example, a trace from a ttcp
run shows:
IP 128.100.3.52.46585 > 128.100.3.51.5001: Flags [.], seq 1:501, ack 1, win 91, options [nop,nop,TS val 729200 ecr 979199256], length 500
The ttcp
run in fact had a whole block of length-500 packets not get
through (this one was the first one). But after a while one of the
retries of this packet made it through, was ACK'd, and suddenly the
conversation was on; 'length 500' packets flowed freely. Also, I don't
know why the packets are being restricted to 500 data octets; I would
have expected them to use something close to the GRE's MTU, which is
1200.
Initially I thought that this problem was due to NetworkManager, which is one reason that I forcefully turned it off. However the problem has now happened multiple times without NM running. The problem started in the Fedora 15 kernel; reverting to the Fedora 14 kernel on my home machine makes things work (although it has other issues). Both the Fedora 16 kernel (to be) and the current 3.1.0 git head also have the problem.
(I suppose now I get to write a problem report to the Linux kernel netdev mailing list and see if anything happens.)
Sidebar: the MTUs involved
The DSL PPPoE link has an MTU of 1492. The GRE tunnel has an MTU of 1200 (on both ends). The remote target has the standard Ethernet MTU of 1500. As far as I can see both the PPPoE link and the GRE tunnel will pass maximum-sized packets in either direction.
However, at the same time tracepath
reports that the GRE tunnel has a
path MTU of 854 only in the home to work direction; for work to home,
the tracepath
reported path MTU is the full 1200 bytes. I don't know
where this limit is coming from. In addition, this doesn't seem to be
the case on the Fedora 14 kernel without the problem; on that kernel,
the reported path MTU is the full 1200 bytes.
(Okay, 'ip route show table cache
' is somewhat helpful here, but I
don't know why the kernel has decided to crank down the path MTU. The
obvious MTU-related /proc/sys/net/ipv4
settings are the same between
the two kernel versions as far as I can see.)