2011-12-18
Understanding the close()
and EINTR
situation
Colin Percival's POSIX close(2) is broken
points out this bit in the POSIX specification for close()
:
If close() is interrupted by a signal that is to be caught, it shall return -1 with errno set to [EINTR] and the state of fildes is unspecified.
(The same thing is true for an IO error during close, so this is really 'if something goes wrong, the state of fildes is unspecified'.)
Colin rightfully considers this crazy, because it means that a
conformant threaded POSIX program has no way of doing anything
sensible in reaction to an EINTR
(or EIO
) during close()
. (A
non-threaded program can retry the close()
and accept an EBADF
.)
Although it invented some things, POSIX is primarily a documentation standard, one that wrote down a common subset of existing practice for Unix systems. Any time you run into something crazy and unusable like this in a documentation standard, you should immediately assume that what really happened is not that the standard authors were morons but that they found two different systems that had incompatible behavior, neither of which were willing to change. This is likely the case here; part of the Hacker News discussion found the clashing examples of HP-UX (where this does not close fildes) and AIX (where this does close fildes).
(POSIX is not a forced standard, so even if it wanted to pick a winner here it wouldn't have had any real effect. Documenting that in this situation you don't know whether or not fildes is still open is the more honest and useful approach for portable programs.)
A more interesting question is why the behavior of leaving fildes open
after EINTR
is ever sensible behavior, and there are two levels to
the answer. The obvious level is that by closing the file descriptor,
you lose any chance of telling the user level program about any actual
errors that happen during the close()
, errors that come from trying to
write out data that the kernel has been holding in memory after previous
write()
calls. A great story, except that it's not convincing for most
systems.
(Most systems don't make you wait for file data to commit to disk before
returning from close()
, so they already let errors happen without
being able to report them. If you want to catch all write errors, you
need to use fsync()
first.)
Which brings us to our old friend NFS (or actually remote filesystems in
general) and things like disk space quotas. Suppose that you are very
close to your disk space quota on a remote fileserver and you run a
program that writes enough data to run out of quota and then close()
's
the file descriptor. Because NFS has no 'reserve some space for me'
operation and your local machine buffered the data until close()
was
called, you can only get an 'out of disk space' error on the close()
;
the first the remote fileserver heard of your new data is when your
local machine started sending it writes as you closed the file. Now
suppose that this close()
takes long enough that it gets interrupted
with an EINTR. If the file descriptor is now invalid, your program has
no way to find out that the data it thought it had written has in fact
been rejected.
(This issue doesn't come up with local filesystems; with a local
filesystem, the kernel could have at least told the filesystem to
reserve space when you did your write()
s so the filesystem could have
immediately reported errors when you ran out of quota space.)
Unlike write errors on close()
on local filesystems, which almost
never happen, quota errors on close()
on remote filesystems are at
least reasonably possible. There are some environments where you can
even expect them to be reasonably frequent (it shows that I used to run
an undergraduate computing environment). Thus it's at least sensible
for Unix systems to worry about this potential case and decide that
close()
should not close the file descriptor in the EINTR
case.
(With that said, HP-UX is the odd man out here and I don't know where it got its behavior from.)