The many return values of read()
(plus some other cases)
In the Hacker News discussion on my entry on
finding a bug in GNU Tar, there
grew a sub-thread about the many different cases of read()
's
return value. This
is a good example of the potential complexity of the Unix API in
practice, and to illustrate it I'm going to run down as many of the
cases that I can remember. In all cases, we'll start with 'n =
read(fd, buf, bufsize)
'.
The simplest and most boring case is a full read, where n
is
bufsize
. This is the usual case when reading from files, except
at the end of file. However, you can get a full read in various
other cases if there is enough buffered input waiting for you. If
you get a full read from a TTY while in line buffered mode, the
final line in your input buffer may not be newline terminated. In
some cases this may even be the only line in your input buffer (if
you have a relatively small input buffer and someone stuffed some
giant input into it).
A partial read is where n
is larger than zero but less than
bufsize
. There are many causes of a partial read; you may have
hit end of file, you may be reading from a TTY in either regular
line buffered mode or raw mode, you may
be reading from the network and that's all of the network input
that's currently available, or if the read()
was interrupted by
a signal after it transferred some data into your buffer. There
are probably other cases, especially since it's not necessarily
standardized what conditions do and don't produce partial reads
instead of complete failures.
(The standard says
that read()
definitely will return a partial read if it's interrupted
by a signal after it's already read some data, but who knows if all
actual Unixes behave that way for all types of file descriptors.)
If you're reading from a TTY in line buffered mode, a partial read
doesn't mean that you have a full line (someone could have typed
EOF on a partial line), or that you have only
one line in your input buffer, not several (if a lot of input has
built up since you last read()
from standard input).
A zero read is where n
is zero. The most common case is that
you are at the end of file; alternately, you may have deliberately
performed a zero-sized read()
to see if you get any errors but
not gotten any. However, implementations are not obliged to return
available errors on a zero sized read; the standard merely says
that they 'may' do so.
On both TTYs and regular files, end of file is not necessarily a
persistent condition; one read()
may return 0 bytes read and then
a subsequent read()
can return a non-zero result. End of file is
guaranteed to be persistent on some other types of file descriptors,
such as TCP sockets and pipes.
What I'll call a signalling error read is where n
is -1 and
errno
is used to signal a number of temporary conditions. These
include that the read()
was interrupted by a signal, including
SIGALRM
, where you get EINTR
, and that read()
was used on a
file descriptor marked non-blocking and would have blocked (EAGAIN
and EWOULDBLOCK
). It's very possible to get signalling error reads
in situations where you don't expect them, for instance if someone
passed you a file descriptor that is in non-blocking mode (this can
happen with TTYs and is often good for both comedy and an immediate
exit by many shells). At this point they are ordinary error reads.
An ordinary error read is where n
is -1 and errno
is telling
you about various other error conditions. Some of these error
conditions may vanish if you read()
again, either at the same
offset (in a seekable file) or at a different offset, and some are
effectively permanent. It is possible to get ordinary error reads
on many sorts of file descriptors, including on TTYs (where you may
get EIO
under some circumstances). In practice, there is no
limit to what errno
s may be returned by
read()
under various circumstances; any attempt to be exhaustive
is futile, especially if you want to do so in portable code.
Official documentation on possible errno
values is no more than
a hint.
(Because people need to specifically recognize signalling error
read errno
values, they are much better documented and much more
adhered to. You can be pretty confident of what EAGAIN
means if
you get it as an errno
value on a read()
, although whether or
not you expected it is another matter.)
In addition to actual return values from read()
, at least two
additional things can happen when you perform a read()
. The first
is that your read()
can stall for an arbitrary amount of time,
up to and including days, even in situations where you expect it
to complete rapidly and it normally does (for example, reading from
files instead of the network). Second, in some circumstances trying
to read()
from a TTY will result in your program being abruptly
suspended entirely (in fact, your entire process group). This happens
if you're in an environment with shell job control and you aren't
the foreground process group.
All of this adds up to a relatively complex API, and a significant amount of it is implicit. There are some issues that are pretty well known, such as partial reads on network input, but there are others that people may not run into outside of unusual situations.
|
|