Why you need select() even with communication channels

December 30, 2010

Go has re-popularized the idea of handling all of your blocking waiting-for-things operations by using CSP-like communication channels instead of select() (in Go, using goroutines and channels). However, it's my firm belief that this isn't good enough; despite what some people think, you cannot replace select() in most common CSP-like implementations.

The crucial ability that select() gives you is the ability to stop waiting for something in response to some external event or change in program state. In a select() based environment you stop trying to read or write a file descriptor simply by omitting it from the set of file descriptors you give to select(), and you get a chance to do this every time IO happens (and you can make IO happen in response to other events, using long standing tricks).

In a CSP-like environment, the traditional way to handle outside blocking operations is to perform them in a separate goroutine (in Go's terminology), forwarding the results to the rest of the program over a channel. The goroutine alternates between doing the blocking operation and talking to the channel (sending results to it, getting new requests from it, or both); the rest of the program can then wait for all of its IO, or continue processing, or whatever it wants.

It's relatively easy to interrupt such a goroutine if it's currently trying to talk to the channel; you send it a 'poison pill' message that tells it to shut down. However, sending a poison pill message does nothing until the goroutine can pick it up; if the goroutine is blocked in an outside operation such as read() or write(), it's not looking for messages over its channel. Unless you can either forcefully kill the goroutine or interrupt the blocking operation somehow, you're out of luck. Most of the time you can't interrupt the blocking operation itself (at least not without additional consequences that you don't want) and most CSP-like implementations don't give you a way of killing goroutines (because not allowing that simplifies the runtime environment).

Even without an explicit need to interrupt blocking operations, the result can be more complex simply because you need to communicate decisions about what to do back and forth between multiple pieces, some of which sometimes block and don't generate status messages when you'd like them to. For instance, consider the buffering logic for a network copying program, where you want to have a maximum size internal buffer that can be fed and drained asynchronously, with the reader side stopping reading from the network when the buffer is too full. I think that you wind up with an extra 'buffer' goroutine in the middle just to keep track of the buffer space remaining; you can't delegate the work to the write-out side, because the write-out side might be blocked when the reader needs to know if it should keep reading or stall.

(Disclaimer: I could be missing some well-known way around this here since I don't have much experience with CSP-like environments.)

Sidebar: the two uses of select() here

There are two uses of select() in this situation: waiting for multiple IO sources at once, and allowing you to efficiently and accurately report how much data is still waiting to be written (which only requires waiting on a single IO source). What I'm writing about is the first use. In the network copying example, I'm sort of handwaving the second case by assuming that there is some way of doing it, possibly with support from the runtime.


Comments on this page:

By nothings at 2010-12-30 11:06:05:

most CSP-like implementations don't give you a way of killing goroutines (because not allowing that simplifies the runtime environment).

I bet it's more than just simplifying the runtime environment. Is Go 100% message-passing? I assume its authors care enough about efficiency that there is still support for shared state, and then if you have that you now really really can't kill goroutines at all, because you now have the normal thread-killing problem.

(The thread-killing problem: thread libraries often have a "kill thread" function, but you are seriously, seriously enjoined not to use it by anyone with anyone familiar with threaded programming. It is basically almost never safe to kill a thread, because the thread is operating asynchronously and may be in the middle of performing some operation. The thread uses synchronization primitives to make guarantees about how shared data structures appear to outsiders. Killing a thread while it's in the middle of some operation can cause those synchronization primitives to be left in states that can't be recovered from (because only the killed thread is supposed to update them), can cause shared data structures to be left in invalid states, and can generally cause program invariants to be left in a non-invariant state. The only time it's safe to kill a thread is if the thread has been carefully designed for the ability to be killed.)

If it were 100% CSP, as long as you have garbage collection to kill off stuff stranded by the goroutine, it's maybe more plausible, but I bet there are still issues--there still may be unsafe times when the runtime implementation is manipulating shared state which would require a slowdown in the runtime to add don't-kill-me-right-now synchronization, and there still may be program-wide invariants that would become non-invariant, although I haven't thought in this sort of programming paradigm enough to really know.

From 94.194.126.16 at 2011-01-03 16:35:13:

You're quite correct. There are a few ways that CSP-like environments have found to deal with this:

  • Make the scheduler aware of file descriptors, so you can block on a file descriptor becoming ready to read in the same way that you can block upon a channel end. Since the scheduler probably needs to handle timeouts and signals anyway, it's often not that much of an extension. GHC's runtime system does this.
  • One way to implement the above is to have a single, dedicated thread that loops on select() and communicates with the scheduler. You can instead write this as a server process in the language itself: processes that want to block on a file descriptor block on a communication with the server, which they can back out of. My selector module for occam-pi does this.
  • Provide a mechanism to explicitly interrupt a blocking call (by cancelling or signalling the corresponding thread for example). KRoC's runtime system supports this.

I think the first option's the nicest to program against in practice. Besides which, the more information the runtime system's got about what you want to do, the more opportunities it's got to be smart about how it does it...

-- Adam Sampson

Written on 30 December 2010.
« Spam as a tax on public participation in open source projects
One group that can sensibly use non-GPL'd Linux kernel modules »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Dec 30 01:29:08 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.