Realizing the hidden complexity of cloud server networking

May 18, 2024

We have our first cloud server. This cloud server has a public IP address that we can talk to, which is good because we need it and feels straightforward; we have lots of machines with public IP addresses. This public IP address has a firewall that we have to set rules for, which feels perfectly normal; we have firewalls too. Although if I think about it, the cloud provider is working at a much bigger scale, which makes it harder and more impressive. Except that our actual cloud server has a RFC 1918 IP address and is on an internal private network segment, so what we actually are working with is a NAT firewall gateway. And the RFC 1918 address is a sufficiently straightforward /24 that it's clear it's not unique to us; plenty of cloud customer servers must have their own version of the RFC 1918 /24.

That was when I realized how complex all of the infrastructure for this networking has to be behind the scenes. The cloud provider is not merely operating a carrier-grade NAT, which is already non-trivial. They're operating a CGNAT firewall system that can connect a public IP to an IP on a specific internal virtual network, where the IP (and subnet) aren't unique across all of the (internal) networks being NAT'd. I feel that I'm reasonably knowledgeable about networking and I'm not sure how I'd even approach designing a system that did that. It's different in kind from the NAT firewalls I work on, not merely in size (the way plain CGNAT sometimes feels).

Intellectually, I knew that cloud environments were fearsomely complex behind the scenes, with all sorts of spectacular technical underpinnings (and thus all sorts of things to go wrong). But running 'ip -br a' on our first cloud server and then thinking a bit about how it all worked was the first time it really came home to me. Things like virtual machine provisioning, replicated storage, and so on were sufficiently far outside what I work on that I just admired them from a distance. Connecting our cloud server's public IP with its actual IP was the first time I had the 'I work in this area and nothing I know of could pull that off' feeling.

(Of course if we'd all switched over to IPv6 we might not need this complex NAT environment, because in theory all of those cloud servers could have globally unique IPv6 addresses and subnets and all you'd need would be a carrier grade firewall system. I'm not sure that would work in practice, though, and I don't know how clouds handle IPv6 allocation for customer servers. Our cloud server didn't get assigned an IPv6 address when we set it up.)

Comments on this page:

From at 2024-05-19 06:49:26:

If I had to guess (with zero experience whatsoever), it probably involves a Linux VRF per customer with its own routing table (or even a network namespace per customer?), with inbound packets first getting routed to the veth interface that goes into the customer's netns, and all knowledge of the customer's private IPs and DNAT rules being confined within that netns. I don't know if it's sufficiently "cloud scale", but it seems like a plausible method.

(Most of my VMs so far are on "smaller" providers where you just directly get the public IPv4 address on eth0. Those vary in how they implement IPv6, sometimes it's an on-link "/64" that's actually a flat /48 behind the scenes, sometimes the /64 is routed via the VM's link-local address or similar.)

By chris at 2024-05-19 09:51:30:

Some cloud providers now require you to provision the NAT Gateway, if you want to have it. That could be built out using very boring stuff. And for carrying the internal packets, it's likely VXLAN (or a vendor-specific variant).

By jonas at 2024-05-19 12:41:21:

And the RFC 1918 address is a sufficiently straightforward /24 that it's clear it's not unique to us; plenty of cloud customer servers must have their own version of the RFC 1918 /24.

Of course if we'd all switched over to IPv6 we might not need this complex NAT environment

Large ISPs such as Comcast had run out of RFC 1918 space by 2005, and had starting acquiring and assigning public IPv4 space for their private networks. It's why they were eager to move to IPv6: Alain Durand - IPv6 @ Comcast: Managing 100+ Million IP Addresses - NANOG 37, 2006 - PDF slides

While "the cloud" didn't really exist back then, one can imagine that the big providers would've later (and quickly) hit the same problem had they even tried to assign unique private IPv4 addresses. Similarly, RFC 6598 assigned a carrier-grade NAT address range outside of RFC 1918 space, to avoid conflicts. And it's been a huge problem when merging company networks, which informed the IPv6 "Unique Local Address" proposal (for private network numbering) and the deprecation of the original "site-local" prefix.

I recommend you find out how your provider handles IPv6 allocations, even if only as an experiment for now.

By Michael Warkentin at 2024-05-19 15:45:16:

This is the best high level overview I’ve seen from AWS:

Written on 18 May 2024.
« The trade-offs in not using WireGuard to talk to our cloud server
My GNU Emacs MH mail folder completion in MH-E »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat May 18 21:52:56 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.