accept(2)'s problem of trying to return two different sorts of errors

January 6, 2019

A long time ago, I wrote about the dangers of being overly specific in the errno values you looked for, with the specific case being a daemon that exited because an accept() system call got an ECONNRESET that it didn't expect. Recently, John Wiersba left a comment on that entry asking what else the original programmer should have done, given an unexpected error from accept(). In thinking about the issues, I realized that part of the problem is that accept() is actually returning two different sorts of errors and the Unix API doesn't provide it any good way to let people tell the two different sorts apart.

(These days accept() is standardized to return ECONNABORTED instead of ECONNRESET in these circumstances, although this may not be universal.)

The two sorts of errors that accept() is trying to return are errors in the accept() call, such as a bad file descriptor (EBADF, ENOTSOCK) or a bad parameter (EFAULT), and errors in the new connection that accept() may or may not be returning (EAGAIN, ECONNABORTED, etc). One of the differences between the two is that the first sort of errors are probably permanent unless fixed by the program somehow and generally indicate an internal program error, while the second sort of errors will go away if you correctly loop through your accept() sequence again.

A sensibly behaving network daemon should definitely not exit when it gets the second sort of error; it should instead just continue on with its processing loop. However, it's perfectly sensible and probably broadly correct to exit if you get the first sort of error, especially if it's an unknown error and you have no idea how to correct it in your code. If someone has closed a file descriptor on you or it's become a non-socket somehow, continuing will generally just get you an un-ending stream of the same error over and over (and burn CPU, and perhaps flood logs). Exiting is a perfectly sensible way out and often really the only thing you can do.

However, you can't reliably distinguish between these two types of errors unless you believe you can know all of the possible errnos for one or the other of them. Given the general habit of Unixes of adding more errno returns for system calls over time, the practical reality is that you can't. This unfortunately leaves authors of Unix network daemons sort of up in the air; they have to pick one way or the other, and either way might give the wrong answer in some circumstances.

(Perhaps accept() should never have returned the second sort of errors, leaving them all to be discovered on a subsequent use of the file descriptor it returned. But that ship sailed a very long time ago; accept() returning these sorts of errors is even in the Single UNIX Specification for accept().)

I suspect that accept() is not the only the only system call with this sort of split in types of errors (although I can't think of any others off the top of my head). But thankfully I don't think there are too many others, because accept()'s pattern of operation is an unusual one.

PS: The Linux accept() manpage actually has a warning about Linux's behavior here, in the RETURN VALUE section. Linux opts to immediately return a lot of errors detected on the new socket, while other Unixes generally postpone some of them. But note that any Unix can return ECONNABORTED.

Written on 06 January 2019.
« Linux network-scripts being deprecated is a problem for my home PPPoE link
Daemons and the pragmatics of unexpected error values from system calls »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jan 6 23:11:46 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.