2024-03-24
Platform peculiarities and Python (with an example)
I have a long standing little Python tool to turn IP addresses into
verified hostnames and report what's wrong if it can't do this
(doing verified reverse DNS lookups is somewhat complicated). Recently I discovered
that socket.gethostbyaddr()
on
my Linux machines was only returning a single name for an IP address
that was associated with more than one. A Fediverse thread revealed that this
reproduced for some people, but not for everyone, and that it also
happened in other programs.
The Python socket.gethostbyaddr()
documentation doesn't discuss
specific limitations like this, but the overall socket
documentation does
say that the module is basically a layer over the platform's C
library APIs. However, it doesn't document exactly what APIs are
used, and in this case it matters. Glibc on Linux says that
gethostbyaddr()
is
deprecated in favour of getnameinfo()
, so a C
program like CPython might reasonably use either to implement its
gethostbyaddr()
. The C gethostbyaddr()
supports returning
multiple names (at least in theory), but getnameinfo()
specifically
does not; it only ever returns a single name.
In practice, the current CPython on Linux will normally use
gethostbyaddr_r()
(see Modules/socketmodule.c's
socket_gethostbyaddr()).
This means that CPython isn't restricted to returning a single name and
is instead inheriting whatever peculiarities of glibc (or another libc,
for people on Linux distributions that use an alternative libc). On glibc,
it appears that this behavior depends on what NSS modules you're using, with the default glibc
'dns' NSS module not
seeming to normally return multiple names this way, even for glibc APIs
where this is possible.
Given all of this, it's not surprising that the CPython documentation doesn't say anything specific. There's not very much specific it can say, since the behavior varies in so many peculiar ways (and has probably changed over time). However, this does illustrate that platform peculiarities are visible through CPython APIs, for better or worse (and, like me, you may not even be aware of those peculiarities until you encounter them). If you want something that is certain to bypass platform peculiarities, you probably need to do it yourself (in this case, probably with dnspython).
(The Go documentation for a similar function does specifically say that if
it uses the C library it returns at most one result, but that's
because the Go authors know their function calls getnameinfo()
and as mentioned, that can only return one name (at most).)