2024-10-26
The importance of name-based virtual hosts (websites)
I recently read Geoff Huston's The IPv6 Transition, which is actually about why that transition isn't happening. A large reason for that is that we've found ways to cope with the shortage of IPv4 addresses, and one of the things Huston points to here is the introduction of the TLS Server Name Indicator (SNI) as drastically reducing the demand for IPv4 addresses for web servers. This is a nice story, but in actuality, TLS SNI was late to the party. The real hero (or villain) in taming what would otherwise have been a voracious demand for IPv4 addresses for websites is the HTTP Host header and the accompanying idea of name-based virtual hosts. TLS SNI only became important much later, when a mass movement to HTTPS hosts started to happen, partly due to various revelations about pervasive Internet surveillance.
In what is effectively the pre-history of the web, each website had to have its own IP(v4) address (an 'IP-based virtual host', or just your web server). If a single web server was going to support multiple websites, it needed a bunch of IP aliases, one per website. You can still do this today in web servers like Apache, but it has long since been superseded with name-based virtual hosts, which require the browser to send a Host: header with the other HTTP headers in the request. HTTP Host was officially added in HTTP/1.1, but I believe that back in the days basically everything accepted it even for HTTP 1.0 requests and various people patched it into otherwise HTTP/1.0 libraries and clients, possibly even before HTTP/1.1 was officially standardized.
(Since HTTP/1.1 dates from 1999 or so, all of this is ancient history by now.)
TLS SNI only came along much later. The Wikipedia timeline suggests the earliest you might have reasonably been able to use it was in 2009, and that would have required you to use a bleeding edge Apache; if you were using an Apache provided by your 'Long Term Support' Unix distribution, it would have taken years more. At the time that TLS SNI was introduced this was okay, because HTTPS (still) wasn't really seen as something that should be pervasive; instead, it was for occasional high-importance sites.
One result of this long delay for TLS SNI is that for years, you were forced to allocate extra IPv4 addresses and put extra IP aliases on your web servers in order to support multiple HTTPS websites, while you could support all of your plain-HTTP websites from a single IP. Naturally this served as a subtle extra disincentive to supporting HTTPS on what would otherwise be simple name-based virtual hosts; the only websites that it was really easy to support were ones that already had their own IPs (sometimes because they were on separate web servers, and sometimes for historical reasons if you'd been around long enough, as we had been).
(For years we had a mixed tangle of name-based and ip-based virtual hosts, and it was often difficult to recover the history of just why something was ip-based instead of name-based. We eventually managed to reform it down to only a few web servers and a few IP addresses, but it took a while. And even today we have a few virtual hosts that are deliberately ip-based for reasons.)