One complexity of buffered IO on Unix

October 15, 2009

It is surprisingly challenging to get buffered IO completely correct on Unix, and one area that trips people up is correct handling of EOF. You see, there's an important different between EOF on files and EOF on terminals, as a consequence of how terminals generate and signal EOF: EOF on files is persistent, but EOF is on terminals is not.

If you read repeatedly from a file that has hit EOF, you will almost always just get another EOF. But EOF on terminals is a transient thing, so if you read again from a terminal, your code will just sit there and the user will have to type another EOF to get you to pay attention.

(The exception for file EOF is if someone else adds more data to the end of the file that you're reading.)

This means that buffering code on Unix must be careful to remember that it has seen an EOF, and not re-read from the underlying file descriptor or IO stream. You cannot use a simpler, stateless implementation; if you do, it will be irritating.

(You can provide an explicit operation to clear the EOF state if you want to. It probably won't be used very often.)

Unfortunately this is an easy (and common) mistake to make, because it's so hard to notice. Since extra reads on files, pipes, network connections and so on are harmless, everything works fine until your code or program is used to read from a terminal, and this may be quite a while.

Written on 15 October 2009.
« Why 'invite-your-friends' features are spam from you, not your users
A tale of network horror, or at least excitement »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Oct 15 02:35:29 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.