2019-09-16
The problem of 'triangular' Network Address Translation
In my entry on our use of bidirectional NAT and split horizon DNS, I mentioned that we couldn't apply our bidirectional NAT translation to all of our internal traffic in the way that we can for external traffic for two reasons, an obvious one and a subtle one. The obvious reason is our current network topology, which I'm going to discuss in a sidebar below. The more interesting subtle reason is the general problem of what I'm going to call triangular NAT.
Normally when you NAT something in a firewall or a gateway, you're in a situation where the traffic in both directions passes through you. This allows you to do a straightforward NAT implementation where you only rewrite one of the pair of IP addresses involved; either you rewrite the destination address from you to the internal IP and then send the traffic to the internal IP, or you rewrite the source address from the internal IP to you and then send the traffic to the external IP.
However, this straightforward implementation breaks down if the return traffic will not flow through you when it has its original source IP. The obvious case of this is if a client machine is trying to contact a NAT'd server that is actually on its own network. It will send its initial packet to the public IP of the NAT'd machine and this packet will hit your firewall, get its destination address rewritten, and then passed to the server. However, when it replies to the packet, the server will see a destination IP on its local network and just send it directly to the client machine. The client machine will then go 'who are you?', because it's expecting the reply to come from the server's nominal public IP, not its internal one.
(Asymmetric routing can also create this situation, for instance if the machine you're talking to has multiple interfaces and a route to you that doesn't go out the firewall-traversing one.)
In general the only way to handle triangular NAT situations is to force the return traffic to flow through your firewall by always rewriting both IP addresses. Unfortunately this has side effects, the most obvious one being that the server no longer gets the IP address of who it's really talking to; as far as it's concerned, all of the connections are coming from your firewall. This is often less than desirable.
(As an additional practical issue, not all NAT implementations are very enthusiastic about doing such two-sided rewriting.)
Sidebar: Our obvious problem is network topology
At the moment, our network topology basically has three layers; there is the outside world, our perimeter firewall, our public IP subnets with various servers and firewalls, and then our internal RFC 1918 'sandbox' subnets (behind those firewalls). Our mostly virtual BINAT subnet with the public IPs of BINAT machines basically hangs off the side of our public subnets. This creates two topology problems. The first topology problem is that there's no firewall to do NAT translation between our public subnets and the BINAT subnet. The larger topology problem is that if we just put a firewall in, we'd be creating a version of the triangular NAT problem because the firewall would have to basically be a virtual one that rewrote incoming traffic out the same interface it came in on.
To make internal BINAT work, we would have to actually add a network layer. The sandbox subnet firewalls would have to live on a separate subnet from all of our other servers, and there would have to be an additional firewall between that subnet and our other public subnets that did the NAT translation for most incoming traffic. This would impose additional network hops and bottlenecks on all internal traffic that wasn't BINAT'd (right now our firewalls deliberately live on the same subnet as our main servers).