A little gotcha when implementing shell read
Reading a line from standard input, as the shell's read
builtin does,
certainly seems like it should be easy to implement or reimplement (if
one's shell is sufficiently primitive). However, it turns out that there
is a subtle problem that forces a pretty inefficient implementation, and
makes it basically impossible to duplicate read
with any common Unix
program.
I can neatly illustrate the problem with a little script:
(echo a; echo b) | (echo `sed 1q`; echo `sed 1q`)
This attempts to duplicate the effects of read
with 'sed 1q
' (one
could substitute 'head -1
' if desired). What you'd expect and like
this to produce as output is two lines, one with 'a
' and one with
'b
'. However, if you try it you'll discover it produces something
else: it produces 'a
' and then a blank line.
What is going on is that sed
, like basically all conventional Unix
programs, is reading from standard input in buffered mode. Thus, when
the first sed
does its read it reads both lines (and then prints one
and exits), leaving nothing for the second sed
to read and print.
In order for read
to behave correctly, it cannot over-read; it can
never read more than one line, because it can't guarantee that it can
put back the excess. Unfortunately, on Unix the only way to be sure that
you read exactly one line is to read()
character by character until
you read the terminating newline, which is inefficient (since you are
making a system call for each character).
(Note that the sed
-based version would work if you didn't feed it from
a pipe and instead ran it interactively, because the kernel line-buffers
tty input for you and sed
will quit immediately after reading and
printing the first line. This can make the problem hard to see or to
debug.)
|
|