GRE is a translucent tunnel
I normally expect IP tunnels to be opaque, that is to act as if they were physical links: the packet is sent down one end and pops out the other end unchanged, just as if it went over a connection between two routers, and the tunnel itself is indifferent to the details of the packets it is transporting. (The technical ISO way of describing this is that I expect IP tunnels to act entirely as layer 2 entities.)
However, one of the peculiarities of GRE is that it is a translucent tunnel, where some of the bits of the packet being tunneled show through (and are affected). In particular, GRE uses the real packet TTL.
More precisely, GRE encapsulated packets and the underlying real packets reuse each other's TTL. By default, the initial TTL of the encapsulated packet is the same as the real packet had when it got to the start point, and at the end of the tunnel the TTL of the de-encapsulated packet is whatever the TTL the encapsulated packet arrived with.
How I noticed this was trying to do a
traceroute of a GRE tunneled
traceroute uses the packet TTL and GRE temporarily
rewrites the origin IP address, everything after my GRE gateway went
blank (the TTL was expiring and the message about it was going to my GRE
gateway instead of to the system running
traceroute). Given that there
are about 20 hops between the endpoints of the GRE tunnel, I wouldn't be
surprised if it was also affecting the general reachability of the far
end of the tunnel.
IPSec in transport mode uses (and alters) the regular packet headers, so GRE over transport mode IPSec is also affected by this. Tunnel mode IPSec is an opaque tunnel, so GRE over tunnel mode IPSec does not have this issue. As a result, I now have a very small IPSec tunnel.
(That tunnels temporarily rewrite the source IP address has interesting consequences for path MTU discovery; if the packet is larger than the MTU of the path the tunnel currently takes, the ICMP error packet will go to the source endpoint instead of the real source. I don't know if kernels are generally smart enough to rewrite the ICMP message a bit and send it on to the real source, if they update the tunnel MTU, or if the ICMP packet just gets dropped.)
Things I have learned while doing GRE tunnels on Linux
In no particular order:
- point to point GRE tunnels have to be symmetric, where each end is a
mirror image of the other. Otherwise the destination kernel will
reject the inbound GRE packets, which makes sense from a security
perspective once you think about it.
- GRE tunnels require a local IP address before you can point routes at them; I suspect that this is generic behavior and is so that the kernel knows what default origin IP address to put on packets going out through them.
- because GRE tunnels are network devices, you can give them a distinct
local IP address, which becomes the default source IP address for anything
routed over the tunnel.
- GRE over IPSec over PPPoE requires a significantly smaller MTU than you might think for reliable operation. Do not assume that the kernel will get it right for you. (Especially if it doesn't know that the other end is using PPPoE.)
- it helps to make sure that both ends are using the same MTU. Unlike
with PPP, nothing automates this for you. (At least I think PPP
automates this for you.)
- because GRE tunnels provide an explicit source address and device,
you can play some really peculiar routing tricks. You don't even seem
to need policy based routing. My current trick is routing the
subnet that the target of the tunnel is on over the tunnel itself,
which makes my head hurt.
- reading the documentation for the
ipcommand is really useful; there's all sorts of powerful tricks lurking there. (And simple policy based routing is not as scary as it looks, honest.)
Now all I have to do is figure out the best way to automate all of this
so that it happens automatically on system boot. This may be kind of
tricky, because I am using a totally manually set up IPSec (complete
with direct invocation of
setkey and fixed keys), and I only want
to IPSec my GRE traffic, not anything else between the two endpoints
ifup-ipsec will do everything except GRE-only IPSec;
it wants to IPSec all traffic. I prefer not to, because that way I
have an out if something goes wrong with IPSec.)