Understanding the close() and EINTR situation

December 18, 2011

Colin Percival's POSIX close(2) is broken points out this bit in the POSIX specification for close():

If close() is interrupted by a signal that is to be caught, it shall return -1 with errno set to [EINTR] and the state of fildes is unspecified.

(The same thing is true for an IO error during close, so this is really 'if something goes wrong, the state of fildes is unspecified'.)

Colin rightfully considers this crazy, because it means that a conformant threaded POSIX program has no way of doing anything sensible in reaction to an EINTR (or EIO) during close(). (A non-threaded program can retry the close() and accept an EBADF.)

Although it invented some things, POSIX is primarily a documentation standard, one that wrote down a common subset of existing practice for Unix systems. Any time you run into something crazy and unusable like this in a documentation standard, you should immediately assume that what really happened is not that the standard authors were morons but that they found two different systems that had incompatible behavior, neither of which were willing to change. This is likely the case here; part of the Hacker News discussion found the clashing examples of HP-UX (where this does not close fildes) and AIX (where this does close fildes).

(POSIX is not a forced standard, so even if it wanted to pick a winner here it wouldn't have had any real effect. Documenting that in this situation you don't know whether or not fildes is still open is the more honest and useful approach for portable programs.)

A more interesting question is why the behavior of leaving fildes open after EINTR is ever sensible behavior, and there are two levels to the answer. The obvious level is that by closing the file descriptor, you lose any chance of telling the user level program about any actual errors that happen during the close(), errors that come from trying to write out data that the kernel has been holding in memory after previous write() calls. A great story, except that it's not convincing for most systems.

(Most systems don't make you wait for file data to commit to disk before returning from close(), so they already let errors happen without being able to report them. If you want to catch all write errors, you need to use fsync() first.)

Which brings us to our old friend NFS (or actually remote filesystems in general) and things like disk space quotas. Suppose that you are very close to your disk space quota on a remote fileserver and you run a program that writes enough data to run out of quota and then close()'s the file descriptor. Because NFS has no 'reserve some space for me' operation and your local machine buffered the data until close() was called, you can only get an 'out of disk space' error on the close(); the first the remote fileserver heard of your new data is when your local machine started sending it writes as you closed the file. Now suppose that this close() takes long enough that it gets interrupted with an EINTR. If the file descriptor is now invalid, your program has no way to find out that the data it thought it had written has in fact been rejected.

(This issue doesn't come up with local filesystems; with a local filesystem, the kernel could have at least told the filesystem to reserve space when you did your write()s so the filesystem could have immediately reported errors when you ran out of quota space.)

Unlike write errors on close() on local filesystems, which almost never happen, quota errors on close() on remote filesystems are at least reasonably possible. There are some environments where you can even expect them to be reasonably frequent (it shows that I used to run an undergraduate computing environment). Thus it's at least sensible for Unix systems to worry about this potential case and decide that close() should not close the file descriptor in the EINTR case.

(With that said, HP-UX is the odd man out here and I don't know where it got its behavior from.)

Written on 18 December 2011.
« Python 3 from the perspective of a Unix sysadmin
An advance fee fraud spam aphorism »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Dec 18 00:37:47 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.