Why we have public websites on private IPs (internally)

November 16, 2021

In yesterday's entry about how Chrome may start restricting requests to private networks, I mentioned that we have various public websites that are actually on private IPs, as far as people inside our network perimeter are concerned. You might wonder why. The too-short answer is that we don't have enough public IPs to go around, but the longer answer is that it's because of how our internal networks are organized.

As a computer science department, we have a bunch of separate research groups and professors. Many of them have their own machines and their own network needs, so in our network layout we put them on separate subnets, what we call "sandbox" subnets (and then we have some more for things like random laptops). Because we don't have anywhere near enough public IP address space, these subnets use RFC 1918 private IP address space.

Various people and research groups need or want to run public websites on their own machines. Generally they want these machines to be in their own subnets; sometimes it's actively required for various reasons. This means we can't physically put all of these servers on one subnet (with public IPs). We do have to assign them public IPs so they're reachable from the world, but then we have to somehow translate requests to the public IPs to the private IPs. We've opted to do this with NAT on our external firewall.

We use NAT instead of anything else, such as reverse proxies, for a variety of reasons. Some people are dealing with sensitive data that should go directly to their carefully secured server (naturally they use HTTPS). Some people are doing odd things and we don't want to worry about the potential impact of their traffic on shared servers, or for that matter any performance restrictions that a shared server in the path might create. Some people are using their own web server specifically so they don't have to get whatever web software they're running working behind our existing reverse proxy system. With NAT done by the external firewall, we can make their web servers public with as much performance and as little impact on everyone as possible (and with a minimum of our server resources used).

However, it does mean that there is no feasible way for people inside our network perimeter to talk to the public IPs. The public IPs exist only for external traffic transiting inward through our external firewall, and internal traffic doesn't do that. In fact, in a reasonably normal case the browser is on the same internal subnet as the web server (because it's a developer working on the site).

Sidebar: IPv6 was not and is not the solution

This network design and this requirement for public machines with private IPs predates usable IPv6. It works today, which leaves us with very little to gain from blowing it up and moving to IPv6, especially since it will have to keep on working for IPv4 for the foreseeable future. It's also pretty much a required feature that the source IPs of people talking to the web servers don't get changed.

(For the foreseeable future there will also be internal IPv4 only servers and clients.)


Comments on this page:

By Arnaud Gomes at 2021-11-17 04:22:01:

Why not have the public IP address on a loopback interface on the internal machine and route it there from your front-end firewall? This is not much more work, probably would not require more IP addresses (I assume all these servers need ports 80 and 443 anyway) and avoids the drawbacks of NAT.

I have used both approaches over the years and found routing much better than NAT on all accounts.

   -- A

Did y'all consider putting the public IP in DNS everywhere and hairpin NATing on the firewall?

Simpler, since there is consistent addressing everywhere, at the cost of a small bit of latency for internal users and more load on your firewall...

By cks at 2021-11-17 11:50:11:

One of the problems is that we couldn't do hairpin NAT without rewriting the source IP address of HTTP requests, which is undesirable. Our perimeter firewall is also a bridging firewall instead of a routing one, which makes hairpin NAT more difficult and complex.

As far as having the public IPs on the actual machines themselves, I don't see how to do this without having complicated routing of individual IPs at multiple points. Each public IP would need to be internally routed to the subnet it's on, which means a /32 routing entry on our core router and at least one firewall. This would also require updating the routing table on our core router on a regular basis as we added and removed such machines.

(It also opens up plenty of opportunities for misconfiguration on the web server hosts themselves.)

One of the problems is that we couldn't do hairpin NAT without rewriting the source IP address of HTTP requests, which is undesirable.

Is this a limitation of your firewall software/appliance? When hairpin NATing from an internal address to an internal server using the server's external IP, "w" on the server shows my internal address.

Our perimeter firewall is also a bridging firewall instead of a routing one, which makes hairpin NAT more difficult and complex.

Wow, that seems like it would make any kind of NAT difficult and complex!

By cks at 2021-11-17 14:49:42:

Unless the device doing NAT sits between the two hosts involved (which an external perimeter firewall will not for two internal machines), it must rewrite the source IP of NAT"d traffic to insure that return traffic also travels back through it. Otherwise, internal host A will contact public IP P, have P transformed into internal host B, and then internal host B will receive traffic labeled as 'from A, to B' and will reply directly to A. However, A is expecting replies from the public IP P, not from B, and will drop the 'from B to A' packets as irrelevant.

A bridging firewall on a touchdown network (as ours is) has no problem doing NAT in either direction for traffic passing through it.

(A bridging firewall on a shared network with other hosts may have problems rewriting the packet's destination MAC to go with the new destination IP. But this isn't a problem if the packet is still going to your core router on the shared network to transit through to other internal networks, and you're just changing which internal network it will wind up on. We used to have this configuration and it worked fine.)

Unless the device doing NAT sits between the two hosts involved (which an external perimeter firewall will not for two internal machines), it must rewrite the source IP of NAT"d traffic to insure that return traffic also travels back through it.

You're absolutely right, of course, which caused me to investigate how this is actually happening on the network I am on.

It turns out that externally visible machines go in a separate subnet. Presumably to allow this exact use-case, since the subnet is still on the same VLAN, so there is not much isolation being gained. So the firewall can rewrite in both directions when it is routing between subnets.

By Arnaud Gomes at 2021-11-18 04:12:57:

Frankly, when you describe all the hoops you have to jump through in order to avoid routing individual IP addresses, I wonder if it is really worth it. In a similar case I would just set up BGP or OSPF if managing static routes is too much work (it often is), nowadays we have tools like BIRD which make it quite painless. Unless you have HA stateful firewall pairs, which can be difficult to integrate with dynamic routing, but I'm under the impression you don't.

Disclaimer: this is from the point of view of a web hoster. Which trade-off you find easier / better / worth it depends a lot on local habits and priorities.

   -- A
Written on 16 November 2021.
« Chrome may start restricting requests to private networks
Why we have a split-horizon DNS setup »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Nov 16 23:46:31 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.