accept(2)
's problem of trying to return two different sorts of errors
A long time ago, I wrote about the dangers of being overly specific
in the errno
values you looked for, with
the specific case being a daemon that exited because an accept()
system call got an ECONNRESET
that it didn't expect. Recently,
John Wiersba left a comment on that entry asking what else the
original programmer should have done, given an unexpected error
from accept()
. In thinking about the issues, I realized that part
of the problem is that accept()
is actually returning two different
sorts of errors and the Unix API doesn't provide it any good way
to let people tell the two different sorts apart.
(These days accept()
is standardized to return ECONNABORTED
instead of ECONNRESET
in these circumstances, although this may
not be universal.)
The two sorts of errors that accept()
is trying to return are
errors in the accept()
call, such as a bad file descriptor (EBADF
,
ENOTSOCK
) or a bad parameter (EFAULT
), and errors in the new
connection that accept()
may or may not be returning (EAGAIN
,
ECONNABORTED
, etc). One of the differences between the two is that
the first sort of errors are probably permanent unless fixed by the
program somehow and generally indicate an internal program error,
while the second sort of errors will go away if you correctly loop
through your accept()
sequence again.
A sensibly behaving network daemon should definitely not exit when it gets the second sort of error; it should instead just continue on with its processing loop. However, it's perfectly sensible and probably broadly correct to exit if you get the first sort of error, especially if it's an unknown error and you have no idea how to correct it in your code. If someone has closed a file descriptor on you or it's become a non-socket somehow, continuing will generally just get you an un-ending stream of the same error over and over (and burn CPU, and perhaps flood logs). Exiting is a perfectly sensible way out and often really the only thing you can do.
However, you can't reliably distinguish between these two types of
errors unless you believe you can know all of the possible errno
s
for one or the other of them. Given the general habit of Unixes of
adding more errno
returns for system calls over time, the practical
reality is that you can't. This unfortunately leaves authors of
Unix network daemons sort of up in the air; they have to pick one
way or the other, and either way might give the wrong answer in
some circumstances.
(Perhaps accept()
should never have returned the second sort of
errors, leaving them all to be discovered on a subsequent use of
the file descriptor it returned. But that ship sailed a very long
time ago; accept()
returning these sorts of errors is even in
the Single UNIX Specification for accept()
.)
I suspect that accept()
is not the only the only system call with
this sort of split in types of errors (although I can't think of
any others off the top of my head). But thankfully I don't think
there are too many others, because accept()
's pattern of operation
is an unusual one.
PS: The Linux accept()
manpage actually has a
warning about Linux's behavior here, in the RETURN VALUE section.
Linux opts to immediately return a lot of errors detected on the
new socket, while other Unixes generally postpone some of them. But
note that any Unix can return ECONNABORTED
.
|
|