2013-08-07
Understanding how generators help asynchronous programming
I've been reading for a while about how generators mean we can do callback free asynchronous programming, instead of trapping us in callback hell (lately all of the buzz has been about generators in JavaScript; this is typical of what I've read). But I have to confess that I never really got how the whole thing actually worked; all of the example code that people wrote seemed to have a great big 'and then magic happens' surrounding it. Recently I finally had a sudden burst of enlightenment about how it all works (after banging my head on yet another article about JavaScript generators (via)). This is my attempt to explain that enlightenment, if only to stick it in my head.
(I'm going to use a pseudo-Python for example code rather than try to pretend that I can write valid JavaScript without careful testing.)
Let's start with some sort of routine that we want to write in a straight-line way while actually having it be asynchronous:
def process(request): user = yield db.getuser(request.user) group = yield db.getgroup(request.group) ....
The key trick here is that yield
(and generators in general) allow
two-way communication. process()
both returns values to the outside
world (when it uses yield
) and can have values injected into it (as
the value of those yield
expressions). These two sorts of values are
not necessarily the same thing, even though they look like it. What
process()
yield
s to its caller is not necessarily what its caller
gives it back.
This leads to how the magic happens. The db.get*
functions don't
return actual results; instead they return some sort of object which
will let us register a callback to be invoked when their operation
completes. The main loop takes these objects (returned through the
yield
's) and registers something which will add process()
(and the
value to inject back into it) to a scheduling queue. When the callback
fires the main loop will wind up restarting process()
with the actual
result of the database lookup, which emerges as yield
's value inside
process()
. Effectively the main loop's job is to convert delayed,
asynchronous results into actual results and then give them back to
process()
(and other similar routines).
(You could re-invoke process()
directly from the callback but it's
possible that the code structure will get more involved that way.)
The main loop and the db.get*
functions may be complex, but those
only get written once (they're library routines). Everyone writes lots
of versions of process()
and those get to be simple (or at least
simpler).
PS: the main loop needs some additional magic, of course, because the outside world has to inject traffic into this whole thing somewhere. I wave my hands about that part.