2013-02-14
The cost of an API mistake in the socket module's fromfd()
Suppose that you get handed a file descriptor that is a
socket and you want to turn it into a Python socket object
(clearly you are on Unix). The socket module has a Unix-only
fromfd()
function with the argument signature:
socket.fromfd(fd, family, type[, proto])
So how do you determine the family and type of the socket file descriptor you have, since you have to supply them?
Ha ha, silly you. The helpful socket module answer is 'we're not going to help you with that'. In fact the socket module provides no direct, official way of doing this; in order to do so, you need to sneak in through two increasingly baroque back doors in just the right way.
(And at least some things may go wrong if you get it wrong.)
The official Unix way of finding out the type of a socket is to
issue a getsockopt(fd, SOL_SOCKET, SO_TYPE)
call. Unfortunately
the socket module does not allow you to do getsockopt()
on file
descriptors, only on actual socket objects. Fortunately the socket
module does not actually care if you get the family and type right,
at least as far as getsockopt
goes, so:
s = socket.socket(fd, socket.AF_UNIX, socket.SOCK_STREAM) styp = s.getsockopt(socket.SOL_SOCKET, socket.SO_TYPE)
Inconveniently there is no portable getsockopt()
query that will give
you the family. The official Unix way of doing this is more or less to
make a getsockname()
call with a plain struct sockaddr
and then
examine the sockaddr.sa_family
field afterwards. The socket module
doesn't provide a direct way to make raw getsockname()
calls or see
sa_family
, but it does have a .getsockname()
method on socket
objects that gives you decoded, friendly results.
When I started this exercise, I expected that calling s.getsockname()
on a socket created via fromfd()
with the wrong family would raise
a socket.error
exception. I was far, far too innocent. Depending
on exactly what you do, you get either the correct getsockname()
results for the actual type of socket you are dealing with or,
sometimes, interestingly mangled results. On Python 3 you can also get
UnicodeDecodeErrors
in the right circumstances. The safest thing to do
turns out to be to make your dummied-up socket be an AF_UNIX
socket;
you can then call s.getsockname()
with reasonable safety and examine
the resulting name to reverse engineer the socket family.
(It's the safest because AF_UNIX
sockets have the biggest version
of struct sockaddr
; you've got the greatest chance that a full
copy of any other socket family's sockaddr
structure will fit into
it. Python is presumably blindly making the getsockname()
call with
the sockaddr
appropriate for the apparent family, then interpreting it
based on the actual returned socket family. If the sockaddr
structure
is truncated, odd things happen.)
What this really illustrates is that the socket module completely
dropped the ball on fromfd()
's API. You should not be able to give it
a family and type at all; since the rest of the socket code clearly
counts on those being correct, the socket module code should determine
them itself. This would be easier to use and render .getsockname()
non-crazy.
(getsockname()
's implementation is completely sensible if a socket's
family is always correct.)