A weird new IKE IPSec problem that I just had on Fedora 21's latest kernel
Back when I first wrote up my IKE configuration for my point to point GRE tunnel, I restricted the IKE IPSec configuration so that it would only apply IPSec to the GRE traffic with:
conn cksgre [...] leftprotoport=gre rightprotoport=gre [...]
I only did this restriction out of caution and matching my old
manual configuration. A while later I decided that it was a little
silly; although I basically didn't do any unencrypted traffic to
the special GRE touchdown IP address I use at the work end, I might
as well fully protect the traffic since it was basically free. So
I took the
*protoport restrictions out, slightly increasing my
security, and things worked fine for quite some time.
Today this change quietly blew up in my face. The symptoms were that often (although not always) a TCP connection specifically between my home machine and the GRE touchdown IP would stall after it transferred some number of bytes (it's possible that the transfer direction mattered but I haven't tested extensively). Once I narrowed down what was going on from the initial problems I saw, reproduction was pretty consistent: if I did 'ssh -v touchdown-IP' from home I could see it stall during key exchange.
I don't know what's going on here, but it seems specific to running the latest Fedora 21 kernel on both ends; I updated my work machine to kernel 3.19.3-200.fc21 a couple of days ago and did not have this problem, but I updated my home machine to 3.19.3-200.fc21 a few hours ago and started seeing this almost immediately (although it took some time and frustration to diagnose just what the problem was).
(I thought I had some evidence from tcpdump output but in retrospect I'm not sure it meant what I think it meant.)
(I had problems years ago with MTU collapse in the face of recursive GRE tunnel routing, but that was apparently fixed back in 2012 and anyways this is kind of the inverse of that problem, since this is TCP connections flowing outside my GRE tunnel. Still, it feels like a related issue. I did not try various ways of looking at connection MTUs and so on; by the time I realized this was related to IPSec instead of other potential problems it was late enough that I just wanted the whole thing fixed.)