A little gotcha when implementing shell
Reading a line from standard input, as the shell's
read builtin does,
certainly seems like it should be easy to implement or reimplement (if
one's shell is sufficiently primitive). However, it turns out that there
is a subtle problem that forces a pretty inefficient implementation, and
makes it basically impossible to duplicate
read with any common Unix
I can neatly illustrate the problem with a little script:
(echo a; echo b) | (echo `sed 1q`; echo `sed 1q`)
This attempts to duplicate the effects of
read with '
sed 1q' (one
could substitute '
head -1' if desired). What you'd expect and like
this to produce as output is two lines, one with '
a' and one with
b'. However, if you try it you'll discover it produces something
else: it produces '
a' and then a blank line.
What is going on is that
sed, like basically all conventional Unix
programs, is reading from standard input in buffered mode. Thus, when
sed does its read it reads both lines (and then prints one
and exits), leaving nothing for the second
sed to read and print.
In order for
read to behave correctly, it cannot over-read; it can
never read more than one line, because it can't guarantee that it can
put back the excess. Unfortunately, on Unix the only way to be sure that
you read exactly one line is to
read() character by character until
you read the terminating newline, which is inefficient (since you are
making a system call for each character).
(Note that the
sed-based version would work if you didn't feed it from
a pipe and instead ran it interactively, because the kernel line-buffers
tty input for you and
sed will quit immediately after reading and
printing the first line. This can make the problem hard to see or to