The probable and prosaic explanation for a socket()
API choice
It started on Twitter:
@mjdominus: Annoyed today that the BSD people had socket(2) return a single FD instead of a pair the way pipe(2) does. That necessitated shutdown(2).
@thatcks: I suspect they might have felt forced to single-FD returns by per-process and total kernel-wide FD limits back then.
I came up with this idea off the cuff and it felt convincing at the
moment that I tweeted it; after all, if you have a socket server
or the like, such as inetd
, moving to a two-FD model for sockets
means that you've just more or less doubled the number of file
descriptors your process needs. Today we're used to systems that
let processes to have a lot of open file descriptors at once, but
historically Unix had much lower limits and it's not hard to imagine
inetd
running into them.
It's a wonderful theory but it immediately runs aground on the
practical reality that socket()
and accept()
were introduced
no later than 4.1c BSD, while inetd
only came in in 4.3 BSD (which was years later). Thus it seems
very unlikely that the BSD developers were thinking ahead to processes
that would open a lot of sockets at the time that the socket()
API was designed. Instead I think that there are much simpler and
more likely explanations for why the API isn't the way Mark Jason
Dominus would like.
The first is that it seems clear that the BSD people were not
particularly concerned about minimizing new system calls; instead
BSD was already adding a ton of new system features and system
calls. Between 4.0 BSD and 4.1c BSD, they went from 64 syscall table
entries (not all of them real syscalls) to 149 entries. In this
atmosphere, avoiding adding one more system call is not likely to have
been a big motivator or in fact even very much on people's minds. Nor
was networking the only source of additions; 4.1c BSD added rename()
,
mkdir()
, and rmdir()
, for example.
The second is that C makes multi-return APIs more awkward than
single-return APIs. Contrast the pipe()
API, where you must construct
a memory area for the two file descriptors and pass a pointer to it,
with the socket()
API, where you simply assign the return value. Given
a choice, I think a lot of people are going to design a socket()
-style
API rather than a pipe()
-style API.
There's also the related issue that one reason the pipe()
API
works well returning two file descriptors is because the file
descriptors involved almost immediately go in different 'directions'
(often one goes to a sub-process); there aren't very many situations
where you want to pass both file descriptors around to functions
in your program. This is very much not the case in network related
programs, especially programs that use select()
; if socket()
et al returned two file descriptors, one for read and one for write,
I think that you'd find they were often passed around together.
Often you'd prefer them to be one descriptor that you could use
either for reading or writing depending on what you were doing at
the time. Many classical network programs (and protocols) alternate
reading and writing from the network, after all.
(Without processes that open multiple sockets, you might wonder
what select()
is there for. The answer is programs like telnet
and rlogin
(and their servers), which talk to both the network
and the tty at the same time. These were already present in 4.1c
BSD, at the dawn of the socket()
API.)
Sidebar: The pipe()
user API versus the kernel API
Before I actually looked at the 4.1c BSD kernel source code, I was
also going to say that the kernel to user API makes returning more
than one value awkward because your kernel code has to explicitly
fish through the pointer that userland has supplied it in things
like the pipe()
system call. It turns out that this is false.
Instead, as far back as V7 and
probably further, the kernel to user API could return multiple
values; specifically, it could return two values. pipe()
used
this to return both file descriptors without having to fish around
in your user process memory, and it was up to the C library to write
these two return values to your pipefd
array.
I really should have expected this; in a kernel, no one wants to have to look at user process memory if they can help it. Returning two values instead of one just needs an extra register in the general assembly level syscall API and there you are.
|
|