Why it's sensible for large writes to pipes to block

October 6, 2014

Back in this entry I said that large writes to pipes blocking instead of immediately returning with a short write was a sensible API decision. Today let's talk about that, by way of talking about how deciding the other way would be a bad API.

Let's start with a question: in a typical Unix pipeline program like grep, what would be the sensible reactions to trying to write a large amount of data returning a short write indicator? This is clearly not an error that should cause the program to abort (or even to print a warning); instead it's a perfectly normal thing if you're producing output faster than the other side of the pipe can consume it. For most programs, that means the only thing you can really do is pause until you can write more to the pipe. The conclusion is pretty straightforward; in a hypothetical world where such too-large pipe writes returned short write indicators instead of blocking, almost all programs would either wrap their writes in code that paused and retried them or arrange to set a special flag on the file descriptor to say 'block me until everything is written'. Either or both would probably wind up being part of stdio.

If everything is going to have code to work around or deal with something, this suggests that you are picking the wrong default. Thus large writes to pipes blocking by default is the right API decision because it means everyone can write simpler and less error-prone code at the user level.

(There are a number of reasons this is less error-prone, including both programs that don't usually expect to write to pipes (but you tell them to write to /dev/stdout) and programs that usually do short writes that don't block and so don't handle short writes, resulting in silently not writing some amount of their output some of the time.)

There's actually a reason why this is not merely a sensible API but a good one, but that's going to require an additional entry rather than wedging it in here.

Sidebar: This story does not represent actual history

The description I've written above more or less requires that there is some way to wait for a file descriptor to become ready for IO, so that when your write is short you can find out when you can usefully write more. However there was no such mechanism in early Unixes; select() only appeared in UCB BSD (and poll() and friends are even later). This means that having nonblocking pipe writes in V7 Unix would have required an entire set of mechanisms that only appeared later, instead of just a 'little' behavior change.

(However I do suspect that the Bell Labs Unix people actively felt that pipe writes should block just like file writes blocked until complete, barring some error. Had they felt otherwise, the Unix API would likely have been set up somewhat differently and V7 might have had some equivalent of select().)

If you're wondering how V7 could possibly not have something like select(), note that V7 didn't have any networking (partly because networks were extremely new and experimental at the time). Without networking and the problems it brings, there's much less need (or use) for a select().

Written on 06 October 2014.
« Making bug reports is exhausting, frustrating, and stressful
Why blocking writes are a good Unix API (on pipes and elsewhere) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Oct 6 01:03:58 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.