Why blocking writes are a good Unix API (on pipes and elsewhere)

October 7, 2014

One of the principles of good practical programming is that when your program can't make forward progress, it should do nothing rather than, say, continue to burn CPU while it waits for something to do. You want your program to do what work it can and then generally go to sleep, and thus you want APIs that encourage this to happen by default.

Now consider a chain of programs (or processes or services), each one feeding the next. In a multi-process environment like this you usually want something that gets called 'backpressure', where if any one component gets overloaded or can't make further progress it pushes back on the things feeding it so that they stop in turn (and so on back up the chain until everything quietly comes to a stop, not burning CPU and so on).

(You also want an equivalent for downstream services, where they process any input they get (if they can) but then stop doing anything if they stop getting any input at all.)

I don't think it's a coincidence that this describes classic Unix blocking IO to both pipes and files. Unix's blocking writes do backpressure pretty much exactly the way you want to happen; if any stage in a pipeline stalls for some reason, pretty soon all processes involved in it will block and sleep in write()s to their output pipe. Things like disk IO speed limits or slow processing or whatever will naturally do just what you want. And the Unix 'return what's available' behavior on reads does the same thing for the downstream of a stalled process; if the process wrote some output you can process it, but then you'll quietly go to sleep as you block for input.

And this is why I think that Unix having blocking pipe writes by default is not just a sensible API decision but a good one. This decision makes pipes just work right.

(Having short reads also makes the implementation of pipes simpler, because you don't have complex handling in the situation where eg process B is doing a read() of 128 megabytes while process A is trying to write() 64 megabytes to it. The kernel can make this work right, but it needs to go out of its way to do so.)


Comments on this page:

By Cafe Hunk at 2014-10-07 03:00:19:

However, the behaviour of "head" to close its input pipe once it gets enough input causes other standard programs to complain when a write to the pipe triggers an error. I suppose all the other programs can be fixed up to silently stop writing when the output pipe is found to have been closed, but it would have been less trouble if "head" just silently gobbled up the remaining data, or reconnected the pipe to /dev/null or something.

By Fiend at 2014-10-07 11:09:24:

it would have been less trouble if "head" just silently gobbled up the remaining data, or reconnected the pipe to /dev/null or something

Actually, that would result in head running forever when processing an infinite source. Reconnecting to /dev/null would similarly lead to infinite I/O for no good reason.

It's critical for programs in pipelines to be able to tell their input sources to stop sending data, and in turn properly handle such signals from their output sinks. The default Unix way of handling this seems like a good idea to me.

Written on 07 October 2014.
« Why it's sensible for large writes to pipes to block
Simple web application environments and per-request state »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Oct 7 00:22:09 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.