Wandering Thoughts archives

2008-12-05

A little gotcha when implementing shell read

Reading a line from standard input, as the shell's read builtin does, certainly seems like it should be easy to implement or reimplement (if one's shell is sufficiently primitive). However, it turns out that there is a subtle problem that forces a pretty inefficient implementation, and makes it basically impossible to duplicate read with any common Unix program.

I can neatly illustrate the problem with a little script:

(echo a; echo b) |
   (echo `sed 1q`; echo `sed 1q`)

This attempts to duplicate the effects of read with 'sed 1q' (one could substitute 'head -1' if desired). What you'd expect and like this to produce as output is two lines, one with 'a' and one with 'b'. However, if you try it you'll discover it produces something else: it produces 'a' and then a blank line.

What is going on is that sed, like basically all conventional Unix programs, is reading from standard input in buffered mode. Thus, when the first sed does its read it reads both lines (and then prints one and exits), leaving nothing for the second sed to read and print.

In order for read to behave correctly, it cannot over-read; it can never read more than one line, because it can't guarantee that it can put back the excess. Unfortunately, on Unix the only way to be sure that you read exactly one line is to read() character by character until you read the terminating newline, which is inefficient (since you are making a system call for each character).

(Note that the sed-based version would work if you didn't feed it from a pipe and instead ran it interactively, because the kernel line-buffers tty input for you and sed will quit immediately after reading and printing the first line. This can make the problem hard to see or to debug.)

unix/ReadBufferingIssue written at 00:00:28; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.