Wandering Thoughts archives


Understanding how generators help asynchronous programming

I've been reading for a while about how generators mean we can do callback free asynchronous programming, instead of trapping us in callback hell (lately all of the buzz has been about generators in JavaScript; this is typical of what I've read). But I have to confess that I never really got how the whole thing actually worked; all of the example code that people wrote seemed to have a great big 'and then magic happens' surrounding it. Recently I finally had a sudden burst of enlightenment about how it all works (after banging my head on yet another article about JavaScript generators (via)). This is my attempt to explain that enlightenment, if only to stick it in my head.

(I'm going to use a pseudo-Python for example code rather than try to pretend that I can write valid JavaScript without careful testing.)

Let's start with some sort of routine that we want to write in a straight-line way while actually having it be asynchronous:

def process(request):
   user = yield db.getuser(request.user)
   group = yield db.getgroup(request.group)

The key trick here is that yield (and generators in general) allow two-way communication. process() both returns values to the outside world (when it uses yield) and can have values injected into it (as the value of those yield expressions). These two sorts of values are not necessarily the same thing, even though they look like it. What process() yields to its caller is not necessarily what its caller gives it back.

This leads to how the magic happens. The db.get* functions don't return actual results; instead they return some sort of object which will let us register a callback to be invoked when their operation completes. The main loop takes these objects (returned through the yield's) and registers something which will add process() (and the value to inject back into it) to a scheduling queue. When the callback fires the main loop will wind up restarting process() with the actual result of the database lookup, which emerges as yield's value inside process(). Effectively the main loop's job is to convert delayed, asynchronous results into actual results and then give them back to process() (and other similar routines).

(You could re-invoke process() directly from the callback but it's possible that the code structure will get more involved that way.)

The main loop and the db.get* functions may be complex, but those only get written once (they're library routines). Everyone writes lots of versions of process() and those get to be simple (or at least simpler).

PS: the main loop needs some additional magic, of course, because the outside world has to inject traffic into this whole thing somewhere. I wave my hands about that part.

programming/GeneratorsAndAsync written at 00:15:05; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.