The danger of specific errno values

One of the things I've learned the hard way in writing Unix programs is that you should almost never expect to know all of the errno values that system calls can return. In theory it's documented in the manpages, but in practice a manpage's list is never comprehensive. The safest thing to do is to assume that any system call can return almost any errno value.

The example I remember most vividly is that our SMTP server daemon used to carefully check the result of accept(). If there was an error and it was anything besides EAGAIN, the daemon decided that something was undoubtedly broken and therefor it should die. Unfortunately, it turns out that on some systems accept() can also return ECONNRESET, because people can initiate a TCP connection and then terminate it before the SMTP daemon gets a chance to accept() the new connection.

Until we caught this and fixed it, the result was that every so often (usually under periods of high load), the daemon would die mysteriously. Whoops.

This also points out another danger with checking specific errno values: they can change over time. So even code that was perfect and correct when it was written can drift into having problems over time and new systems, unless you recheck and revise it all the time.

(Our SMTP server daemon's code was probably perfectly correct for BSD systems of the early 1990s, for example.)

These are my WanderingThoughts
(About the blog)

GettingAround
Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.

* * *

Atom feeds are available; see the bottom of most pages.

This is a DWiki.
(Help)

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

Search:
Written on 16 January 2006.
(Previous | Next)

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jan 16 00:03:45 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.