When iptables SNAT and routing happens, and how this is annoying
Per this famous iptables tutorial (via), and also this more recent documentation, locally generated IP packets go through multiple processing steps, both in iptables and in other things the kernel does:
- packets are given an initial routing, which assigns the source IP among other effects
- iptables OUTPUT chain for the raw, mangle and then nat tables
- packets are re-routed in case iptables changed something here, although I believe their source IP will never be changed
- iptables OUTPUT chain for the (default) filter table
- iptables POSTROUTING chain for the mangle and then nat tables
- packet is transmitted, at least in a logical sense (I think IPSec magic may happen here)
Actually, my description is not quite accurate. Iptables has two sorts of NAT: SNAT (for the source IP address) and DNAT (for the destination address). The OUTPUT chain's nat table can only do DNAT. SNAT can only be done in POSTROUTING, which happens, well, after routing (and which applies to all packets leaving the machine, not just locally generated ones).
Under normal circumstances this is perfectly fine, because under normal circumstances routing is only affected by the destination address and that's changed by DNAT in the OUTPUT chain, before the second routing pass. However, if you are doing policy based routing you actually do want to make routing decisions based on the source IP and by the time you can change it, it's too late. You must do a two stage change to get the same result, assuming it works.
(I wrote about this long ago, but at the time I hadn't read anything on the processing order and so on. Now I have a motivation, so I'm starting to dig. It's interesting to see my old guess that packets were being routed twice is in fact correct, although it's humbling to know that if I'd just read the tutorial I could have known that back then. My failure to read documentation if I'm bored or irritated is not a new thing.)
I can't confidently assert that this limitation on where SNAT can be used is unnecessary, but it certainly seems that way to me. If SNAT or some other method of changing or forcing the source IP could be used in the OUTPUT chain, life would be simpler and more powerful for policy based routing decisions. You'd force the source IP to whatever you needed and then the second routing pass would do all of the work, with much less possibility of packets going out an interface with the wrong source address attached to them.
(Having to use SNAT here is already vaguely absurd, since we're firing up an entire elaborate netfilter machine for state tracking and address translation when we actually don't need it at all. But I suspect that no one has written an iptables/netfilter module that just changes the source IP without NAT'ing things, and I have to admit that uses for it are a bit obscure. I'm a special case here.)