2010-12-30
Why you need select()
even with communication channels
Go has re-popularized the idea of handling
all of your blocking waiting-for-things operations by using CSP-like
communication channels instead of select()
(in Go, using goroutines
and channels). However, it's my firm belief that this isn't good enough;
despite what some people think, you cannot replace select()
in most
common CSP-like implementations.
The crucial ability that select()
gives you is the ability to stop
waiting for something in response to some external event or change in
program state. In a select()
based environment you stop trying to read
or write a file descriptor simply by omitting it from the set of file
descriptors you give to select()
, and you get a chance to do this
every time IO happens (and you can make IO happen in response to other
events, using long standing tricks).
In a CSP-like environment, the traditional way to handle outside blocking operations is to perform them in a separate goroutine (in Go's terminology), forwarding the results to the rest of the program over a channel. The goroutine alternates between doing the blocking operation and talking to the channel (sending results to it, getting new requests from it, or both); the rest of the program can then wait for all of its IO, or continue processing, or whatever it wants.
It's relatively easy to interrupt such a goroutine if it's currently
trying to talk to the channel; you send it a 'poison pill' message that
tells it to shut down. However, sending a poison pill message does
nothing until the goroutine can pick it up; if the goroutine is blocked
in an outside operation such as read()
or write()
, it's not looking
for messages over its channel. Unless you can either forcefully kill the
goroutine or interrupt the blocking operation somehow, you're out of
luck. Most of the time you can't interrupt the blocking operation itself
(at least not without additional consequences that you don't want) and
most CSP-like implementations don't give you a way of killing goroutines
(because not allowing that simplifies the runtime environment).
Even without an explicit need to interrupt blocking operations, the result can be more complex simply because you need to communicate decisions about what to do back and forth between multiple pieces, some of which sometimes block and don't generate status messages when you'd like them to. For instance, consider the buffering logic for a network copying program, where you want to have a maximum size internal buffer that can be fed and drained asynchronously, with the reader side stopping reading from the network when the buffer is too full. I think that you wind up with an extra 'buffer' goroutine in the middle just to keep track of the buffer space remaining; you can't delegate the work to the write-out side, because the write-out side might be blocked when the reader needs to know if it should keep reading or stall.
(Disclaimer: I could be missing some well-known way around this here since I don't have much experience with CSP-like environments.)
Sidebar: the two uses of select()
here
There are two uses of select()
in this situation: waiting for multiple
IO sources at once, and allowing you to efficiently and accurately
report how much data is still waiting to be written (which only requires
waiting on a single IO source). What I'm writing about is the first
use. In the network copying example, I'm sort of handwaving the second
case by assuming that there is some way of doing it, possibly with support
from the runtime.