A limitation of OpenBSD bridging NAT firewalls
We put a new SMTP frontend machine in front of our core mail server today to handle incoming email from the outside world (for the usual reasons: dealing with spam better, shielding the core server from six zillion zombies all trying to talk to it at once, etc).
Rather than wait for updated MX entries for all our domains to propagate around and for people to switch to them, we figured we could speed up the process by having our bridging firewall send connections for the core server's port 25 off to the new frontend machine instead. (This would also give us a quick way to back out in case there were problems: just kill the redirect. In fact we could do the redirect before we did the MX switch, as a test.)
OpenBSD makes this pretty simple: just do more or less
rdr on $INT_IF proto tcp from any to <core-smtp> port 25 -> <smtp-front>
So we put such a rule in and enabled it and our clever plan promptly ground to a halt; it didn't work. Trying to talk to port 25 on the core SMTP server stalled out, not even connecting; the new SMTP frontend wasn't seeing any packets from the attempted connections, but the packet filter wasn't rejecting or dropping anything as far as we could tell.
Our network topology looks something like this, from the outside world:
campus backbone → backbone router → bridging firewall → subnet A → core router → all other subnets, including subnet B.
The core mail server is on subnet A; the new SMTP frontend is on subnet B.
The problem is that the OpenBSD bridging firewall doesn't rewrite the destination Ethernet address of the packet when it NATs the destination IP address. Because the destination Ethernet address is unchanged, the packet is still going to go to whatever machine on the local subnet it was originally going to, whether or not this machine will actually accept packets for the new destination IP address.
If the original destination IP is off the subnet, the wire packet's original destination is our core router, and everything works out. However, if the original destination IP is on the subnet, the backbone router is sending the packet straight to its Ethernet address; rewriting the destination IP address merely causes the original destination machine to drop the packet.
(If the original destination IP address is entirely virtual and there is no machine answering ARP requests for it, the backbone router will never put the packet on the wire in the first place.)
So for the future, we will have to bear in mind an important limitation of bridging NAT: you cannot easily have the pre-NAT destination IP address be on the local subnet. Fortunately we have relatively little on subnet A anyways, although it's somewhat annoying to sort of 'lose' a /24 just to be the touchdown point for the campus backbone.
I can think of a couple of ways around this:
- enable IP forwarding on the original destination machine. This should make it be willing to reinject the 'misaddressed' post-NAT packet onto the wire, this time sending it to our core router. (I don't think one can do any evil with this in this specific case, as the core mail server has only one network interface.)
- persuade the backbone router that it is actually in a very tiny subnet with just it and the core router, or better yet that it has a 'point to point over Ethernet' connection to the core router. This is a little bit inefficient if we have a lot of external traffic to things on subnet A, since it goes across the wire twice, and I am not entirely sure that the backbone router would handle 'out of subnet' people sending it packets directly and ARPing for its Ethernet address.
- split subnet A into a bunch of sub-subnets, one of them very small and just for the backbone router. This is more complicated (more machines have to change their configuration) but 'proper' and we can still freely use most of the IP address space in subnet A. They can share a wire or be put on more VLANs.
(Another solution would be to rewrite the destination Ethernet address
alongside the destination IP address, but as far as I can see from the
pf.conf manpage, there's no way to do that.)