The cost of an API mistake in the socket module's fromfd()

February 14, 2013

Suppose that you get handed a file descriptor that is a socket and you want to turn it into a Python socket object (clearly you are on Unix). The socket module has a Unix-only fromfd() function with the argument signature:

socket.fromfd(fd, family, type[, proto])

So how do you determine the family and type of the socket file descriptor you have, since you have to supply them?

Ha ha, silly you. The helpful socket module answer is 'we're not going to help you with that'. In fact the socket module provides no direct, official way of doing this; in order to do so, you need to sneak in through two increasingly baroque back doors in just the right way.

(And at least some things may go wrong if you get it wrong.)

The official Unix way of finding out the type of a socket is to issue a getsockopt(fd, SOL_SOCKET, SO_TYPE) call. Unfortunately the socket module does not allow you to do getsockopt() on file descriptors, only on actual socket objects. Fortunately the socket module does not actually care if you get the family and type right, at least as far as getsockopt goes, so:

s = socket.socket(fd, socket.AF_UNIX, socket.SOCK_STREAM)
styp = s.getsockopt(socket.SOL_SOCKET, socket.SO_TYPE)

Inconveniently there is no portable getsockopt() query that will give you the family. The official Unix way of doing this is more or less to make a getsockname() call with a plain struct sockaddr and then examine the sockaddr.sa_family field afterwards. The socket module doesn't provide a direct way to make raw getsockname() calls or see sa_family, but it does have a .getsockname() method on socket objects that gives you decoded, friendly results.

When I started this exercise, I expected that calling s.getsockname() on a socket created via fromfd() with the wrong family would raise a socket.error exception. I was far, far too innocent. Depending on exactly what you do, you get either the correct getsockname() results for the actual type of socket you are dealing with or, sometimes, interestingly mangled results. On Python 3 you can also get UnicodeDecodeErrors in the right circumstances. The safest thing to do turns out to be to make your dummied-up socket be an AF_UNIX socket; you can then call s.getsockname() with reasonable safety and examine the resulting name to reverse engineer the socket family.

(It's the safest because AF_UNIX sockets have the biggest version of struct sockaddr; you've got the greatest chance that a full copy of any other socket family's sockaddr structure will fit into it. Python is presumably blindly making the getsockname() call with the sockaddr appropriate for the apparent family, then interpreting it based on the actual returned socket family. If the sockaddr structure is truncated, odd things happen.)

What this really illustrates is that the socket module completely dropped the ball on fromfd()'s API. You should not be able to give it a family and type at all; since the rest of the socket code clearly counts on those being correct, the socket module code should determine them itself. This would be easier to use and render .getsockname() non-crazy.

(getsockname()'s implementation is completely sensible if a socket's family is always correct.)

Written on 14 February 2013.
« Some notes on Linux's ionice
SSL/TLS cipher names (aka 'cipher suites') and what goes into them »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Feb 14 00:08:08 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.