A brief history of looking up host addresses in Unix

August 1, 2022

In the beginning, back in V7 Unix and earlier, Unix didn't have networking and so the standard C library didn't have anything to look up host addresses. When BSD famously added IP networking to BSD Unix, that had to change, so BSD added C library functions to look up this sort of information, in the form of the gethost* functions, which first appeared in 4.1c BSD but are probably most widely known in the 4.2 BSD version. Because this was before DNS was really a thing, functions like gethostbyname() searched through /etc/hosts.

The next step in practice in host lookups was done by Sun, when they introduced what was then called YP (until it had to be renamed to NIS because of trademark issues). To avoid having to distribute a potentially large /etc/hosts to all machines and to speed up lookups in it, Sun made their gethostbyaddr() be able to look up host entries through YP; on the YP server, your hosts file was compiled into a database file for efficient lookups (along with all of the other YP information sources). As a fallback, gethostbyaddr could still use your local /etc/hosts, which was useful to insure that you weren't completely out to sea if the YP server stopped responding to you. People who didn't use YP (which was a lot of us) still used /etc/hosts, and perhaps distributed a (large) local version to all of their machines.

(YP was not universally loved by system administrators, to put it one way.)

When DNS was introduced to the world of BSD Unix, it didn't initially get integrated into the C library. Instead, my memory is that BIND shipped with a separate library that implemented DNS-based versions of the various host lookup functions. This caused a lot of Makefiles to pick up stanzas to link things with '-lresolv'. The resolver library also contained additional functions specifically for DNS lookups, so programs like mail transport agents were soon specifically using them (MTAs care about MX lookups, which aren't exposed through the BSD gethost* functions). Later, in 4.3 BSD, nameserver lookups were directly included in the C library gethost* functions (see eg the 4.3 BSD manual page). Still later we got the idea of the Name Service Switch to actually configure how all of these lookups worked.

(My memory is that Sun integrated DNS lookups into YP, so that if you looked up hosts in YP, YP could then do DNS lookups instead of having to have everything in a static /etc/hosts. They also added direct DNS lookup support to their C library, although I'm not sure if this was only after they added support for DNS lookups through YP.)

The next thing that happened was threads. Unfortunately, the gethost* functions are not thread safe because, to quote the manual page's BUGS section:

All information is contained in a static area so it must be copied if it is to be saved. [...]

When people started adding threads to Unix, this led to the creation of reentrant versions of these functions, such as gethostbyname_r(). Support for these reentrant versions wasn't and isn't universal; for example, FreeBSD doesn't have them. One reason for this is that another API problem came up around the same time.

The other problem for gethostbyname() was IPv6, because there's no way for you to tell it what sort of IP addresses you want and no good way for it to return a mix of IPv4 and IPv6 address types. POSIX solved both the threading problem and the IPv6 problem at once in getaddrinfo() (and getnameinfo().); see RFC 3493 for some of the history of the development of these functions. This more or less brings us to today, where you should probably use getaddrinfo() (aka 'gai') for everything. I believe that good versions of getaddrinfo() exist in basically any modern Unix that you want to use.

(An early step in trying to get gethostbyname() to deal with IPv6 was the gethostbyname2() function, which sometimes also got a reentrant _r version.)

PS: Although there was a DNS specification fairly early in the 1980s (cf), it took rather a while for DNS support to appear in actual Unix systems, especially as a standard part of the C library instead of as third party software added by the local sysadmin (which was how you often got a -lresolv back in the day; you could compile and install the BIND libraries yourself, then relink critical programs against them).

(This entry was sparked by What does it take to resolve a hostname (via).)

Written on 01 August 2022.
« Using Prometheus's recent '@ end()' PromQL feature to reduce graph noise
I wish that systemd (and everything) would rate-limit configuration warnings »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Aug 1 21:56:23 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.