The probable and prosaic explanation for a socket() API choice

June 30, 2015

It started on Twitter:

@mjdominus: Annoyed today that the BSD people had socket(2) return a single FD instead of a pair the way pipe(2) does. That necessitated shutdown(2).

@thatcks: I suspect they might have felt forced to single-FD returns by per-process and total kernel-wide FD limits back then.

I came up with this idea off the cuff and it felt convincing at the moment that I tweeted it; after all, if you have a socket server or the like, such as inetd, moving to a two-FD model for sockets means that you've just more or less doubled the number of file descriptors your process needs. Today we're used to systems that let processes to have a lot of open file descriptors at once, but historically Unix had much lower limits and it's not hard to imagine inetd running into them.

It's a wonderful theory but it immediately runs aground on the practical reality that socket() and accept() were introduced no later than 4.1c BSD, while inetd only came in in 4.3 BSD (which was years later). Thus it seems very unlikely that the BSD developers were thinking ahead to processes that would open a lot of sockets at the time that the socket() API was designed. Instead I think that there are much simpler and more likely explanations for why the API isn't the way Mark Jason Dominus would like.

The first is that it seems clear that the BSD people were not particularly concerned about minimizing new system calls; instead BSD was already adding a ton of new system features and system calls. Between 4.0 BSD and 4.1c BSD, they went from 64 syscall table entries (not all of them real syscalls) to 149 entries. In this atmosphere, avoiding adding one more system call is not likely to have been a big motivator or in fact even very much on people's minds. Nor was networking the only source of additions; 4.1c BSD added rename(), mkdir(), and rmdir(), for example.

The second is that C makes multi-return APIs more awkward than single-return APIs. Contrast the pipe() API, where you must construct a memory area for the two file descriptors and pass a pointer to it, with the socket() API, where you simply assign the return value. Given a choice, I think a lot of people are going to design a socket()-style API rather than a pipe()-style API.

There's also the related issue that one reason the pipe() API works well returning two file descriptors is because the file descriptors involved almost immediately go in different 'directions' (often one goes to a sub-process); there aren't very many situations where you want to pass both file descriptors around to functions in your program. This is very much not the case in network related programs, especially programs that use select(); if socket() et al returned two file descriptors, one for read and one for write, I think that you'd find they were often passed around together. Often you'd prefer them to be one descriptor that you could use either for reading or writing depending on what you were doing at the time. Many classical network programs (and protocols) alternate reading and writing from the network, after all.

(Without processes that open multiple sockets, you might wonder what select() is there for. The answer is programs like telnet and rlogin (and their servers), which talk to both the network and the tty at the same time. These were already present in 4.1c BSD, at the dawn of the socket() API.)

Sidebar: The pipe() user API versus the kernel API

Before I actually looked at the 4.1c BSD kernel source code, I was also going to say that the kernel to user API makes returning more than one value awkward because your kernel code has to explicitly fish through the pointer that userland has supplied it in things like the pipe() system call. It turns out that this is false. Instead, as far back as V7 and probably further, the kernel to user API could return multiple values; specifically, it could return two values. pipe() used this to return both file descriptors without having to fish around in your user process memory, and it was up to the C library to write these two return values to your pipefd array.

I really should have expected this; in a kernel, no one wants to have to look at user process memory if they can help it. Returning two values instead of one just needs an extra register in the general assembly level syscall API and there you are.

Written on 30 June 2015.
« BSD Unix developed over more time than I usually think
My early impressions of Fedora 22, especially of DNF »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jun 30 01:10:44 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.