Why exposing only blocking APIs are ultimately a bad idea

March 3, 2017

I recently read Marek's Socket API thoughts, which mulls over a number of issues and ends with the remark:

But nonetheless, I very much like the idea of only blocking API's being exposed to the user.

This is definitely an attractive idea. All of the various attempts at select() style APIs have generally not gone well, high level callbacks give you 'callback hell', and it would be conceptually nice to combine cheap concurrency with purely blocking APIs to have our cake and eat it too. It's no wonder this idea comes up repeatedly and I feel the tug of it myself.

Unfortunately, I've wound up feeling that it's fundamentally a mistake. While superficially attractive, attempting to do this in the real world is going to wind up with an increasingly ugly mess in practice. For the moment let's set aside the issue that cheap concurrency is fundamentally an illusion and assume that we can make the illusion work well enough here. This still leaves us with the select() problem: sooner or later the result of one IO will make you want to stop doing another waiting IO. Or more generally, sooner or later you'll want to stop doing some bit of blocking IO as the result of other events and processing inside your program.

When all IO is blocking, separate IO must be handled by separate threads and thus you need to support external (cross-thread) cancellation of in-flight blocked IO out from underneath a thread. The moment you have this sort of unsynchronized and forced cross-thread interaction, you have a whole collection of thorny concurrency issues that we have historically not been very good at dealing with. It's basically guaranteed that people will write IO handling code with subtle race conditions and unhandled (or mishandled) error conditions, because (as usual) they didn't realize that something was possible or that their code could be trying to do thing X right as thing Y was happening.

(I'm sure that there are API design mistakes that can and will be made here, too, just as there have been a series of API design mistakes around select() and its successors. Even APIs are hard to get completely right in the face of concurrency issues.)

There is no fix for this that I can see for purely blocking APIs. Either you allow external cancellation of blocked IO, which creates the cross-thread problems, or you disallow it and significantly limit your IO model, creating real complications as well as limiting what kind of systems your APIs can support.

(For the people who are about to say 'but Go makes it work', I'm afraid that Go doesn't. It chooses to limit what sort of systems you can build, and I'm not just talking about the memory issues.)

PS: I think it's possible to sort of square the circle here, but the solution must be deeply embedded into the language and its runtime. The basic idea is to create a CSP like environment where waiting for IO to complete is a channel receive or send operation, and may be mixed with other channel operations in a select. Once you have this, you have a relatively clean way to cancel a blocked IO; the thread performing the IO simply uses a multi-select, where one channel is the IO operation and another is the 'abort the operation' channel. This doesn't guarantee that everyone will get it right, but it does at least reduce your problem down to the existing problem of properly handling channel operation ordering and so on. But this is not really a 'only blocking API' as we normally think of it and, as mentioned, it requires very deep support in the language and runtime (since under the hood this has to actually be asynchronous IO and possibly involve multiple threads).

This is also going to sometimes be somewhat of a lie, because on many systems there is a certain amount of IO that is genuinely synchronous and can't be interrupted at all, despite you putting it in a multi-channel select statement. Many Unixes don't really support asynchronous reads and writes from files on disk, for example.

Written on 03 March 2017.
« Some notes on ZFS per-user quotas and their interactions with NFS
Should you add MX entries for hosts in your (public) DNS? »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Mar 3 23:53:52 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.