Wandering Thoughts archives

2024-03-24

Platform peculiarities and Python (with an example)

I have a long standing little Python tool to turn IP addresses into verified hostnames and report what's wrong if it can't do this (doing verified reverse DNS lookups is somewhat complicated). Recently I discovered that socket.gethostbyaddr() on my Linux machines was only returning a single name for an IP address that was associated with more than one. A Fediverse thread revealed that this reproduced for some people, but not for everyone, and that it also happened in other programs.

The Python socket.gethostbyaddr() documentation doesn't discuss specific limitations like this, but the overall socket documentation does say that the module is basically a layer over the platform's C library APIs. However, it doesn't document exactly what APIs are used, and in this case it matters. Glibc on Linux says that gethostbyaddr() is deprecated in favour of getnameinfo(), so a C program like CPython might reasonably use either to implement its gethostbyaddr(). The C gethostbyaddr() supports returning multiple names (at least in theory), but getnameinfo() specifically does not; it only ever returns a single name.

In practice, the current CPython on Linux will normally use gethostbyaddr_r() (see Modules/socketmodule.c's socket_gethostbyaddr()). This means that CPython isn't restricted to returning a single name and is instead inheriting whatever peculiarities of glibc (or another libc, for people on Linux distributions that use an alternative libc). On glibc, it appears that this behavior depends on what NSS modules you're using, with the default glibc 'dns' NSS module not seeming to normally return multiple names this way, even for glibc APIs where this is possible.

Given all of this, it's not surprising that the CPython documentation doesn't say anything specific. There's not very much specific it can say, since the behavior varies in so many peculiar ways (and has probably changed over time). However, this does illustrate that platform peculiarities are visible through CPython APIs, for better or worse (and, like me, you may not even be aware of those peculiarities until you encounter them). If you want something that is certain to bypass platform peculiarities, you probably need to do it yourself (in this case, probably with dnspython).

(The Go documentation for a similar function does specifically say that if it uses the C library it returns at most one result, but that's because the Go authors know their function calls getnameinfo() and as mentioned, that can only return one name (at most).)

PythonAndPlatformPeculiarities written at 22:53:03;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.