Wandering Thoughts archives

2017-03-18

Part of why Python 3.5's await and async have some odd usage restrictions

Python 3.5 added a new system for coroutines and asynchronous programming, based around new async and await keywords (which have the technical details written up at length in PEP 492). Roughly speaking, in terms of coroutines implemented with yield from, await replaces 'yield from' (and is more powerful). So what's async for? Well, it marks a function that can use await. If you use await outside an async function, you'll get a syntax error. Functions marked async have some odd restrictions, too, such as that you can't use yield or yield from in them.

When I described doing coroutines with yield from here, I noted that it was potentially error prone because in order to make everything work you had to have an unbroken chain of yield from from top to bottom. Break the chain or use yield instead of yield from, and things wouldn't work. And because both yield from and yield are used for regular generators as well as coroutines, it's possible to slip up in various ways. Well, when you introduce new syntax you can fix issues like that, and that's part of why async and await have their odd rules.

A function marked async is a (native) coroutine. await can only be applied to coroutines, which means that you can't accidentally treat a generator like a coroutine the way you can with yield from. Simplifying slightly, coroutines can only be invoked through await; you can't call one or use them as a generator, for example as 'for something in coroutine(...):'. As part of not being generators, coroutines can't use 'yield' or 'yield from'.

(And there's only await, so you avoid the whole 'yield' verus 'yield from' confusion.)

In other words, coroutines can only be invoked from coroutines and they must be invoked using the exact mechanism that makes coroutines work (and that mechanism isn't and can't be used for or by anything else). The entire system is designed so that you're more or less forced to create that unbroken chain of awaits that makes it all go. Although Python itself won't error out on import time if you try to call a async function without await (it just won't work at runtime), there's probably Python static checkers that look for this. And in general it's an easy rule to keep track of; if it's async, you have to await it, and this status is marked right there in the function definition.

(Unfortunately it's not in the type of the function, which means that you can't tell by just importing the module interactively and then doing 'type(mod.func)'.)

Sidebar: The other reason you can only use await in async functions

Before Python 3.5, the following was completely valid code:

def somefunc(a1, b2):
   ...
   await = interval(a1, 10)
   otherfunc(b2, await)
   ...

In other words, await was not a reserved keyword and so could be legally used as the name of a local variable, or for that matter a function argument or a global.

Had Python 3.5 made await a keyword in all contexts, all such code would immediately have broken. That's not acceptable for a minor release, so Python needed some sort of workaround. So it's not that you can't use await outside of functions marked async; it's that it's not a keyword outside of async functions. Since it's not a keyword, writing something like 'await func(arg)' is a syntax error, just as 'abcdef func(arg)' would be.

The same is true of async, by the way:

def somefunc(a, b, async = False):
  if b == 10:
     async = True
  ....

Thus why it's a syntax error to use 'async for' or 'async with' outside of an async function; outside of such functions async isn't even a keyword so 'async for' is treated the same as 'abcdef for'.

(I'm sure this makes Python's parser that much more fun.)

AsyncAwaitRestrictionsWhy written at 01:11:54; Add Comment

2017-03-16

How we can use yield from to implement coroutines

Give my new understanding of generator functions and yield from, we can now see how to use yield from to implement coroutines and an event loop. Consider a three level stack of functions, where on the top layer you have an event loop, in the middle you have the processing code you write, and on the bottom are event functions like wait_read() or sleep().

Let's start with an example processing function or two:

def countdown(n):
   while n:
      print("T-minus", n)
      n -= 1
      yield from sleep(1)

def launch(what, nsecs):
   print("counting down for", what)
   yield from countdown(nsecs)
   print("launching", what)

To start a launch, we call something like 'coro.start(launch("fred", 10))', which looks a bit peculiar since it sort of seems like coro.start() should get control only after the launch. However, we already know that calling a generator function doesn't do exactly what it looks like. What coro.start() gets when we do this is an unstarted generator object (which handily encapsulates those arguments to launch(), so we don't have to do it by hand).

When the coroutine scheduler starts the launch() generator object, we wind up with a chain of yield froms that bottoms out at sleep(). What sleep() yields is passed back up to the coroutine scheduler and the entire call chain is suspended; this is no different that what I did by calling .send() by hand yesterday. What sleep() returns to the scheduler is an object (call it an event object) that tells the coroutine scheduler under what conditions this coroutine should be resumed. When the scheduler reaches the point that the coroutine should be run again, the scheduler will once again call .send(), which will resume execution in sleep(), which will then return back to countdown(), and so on. The scheduler may use this .send() to pass information back to sleep(), such as how long it took before the coroutine was restarted.

Here yield and yield from are being used for two things. First, they create a communication channel between the coroutine scheduler and the low-level event functions like sleep(). Our launch() and countdown() functions are oblivious to this since they don't touch either the value sleep() yields up to the scheduler or the value that the scheduler injects to sleep() with .send(). Second, the chain of yield from and the final yield neatly suspend the entire call stack.

In order for this to work reliably, there are two rules that our user-written processing functions have to follow. First, they must never accidentally attempt to do anything with the sleep() generator function. It is okay but unclear for a non-generator function to call sleep() and return the result:

def sleep_minutes(n):
   return sleep(n * 60)

def long_countdown(n):
   while n:
      print("T-minus", n, "minutes")
      yield from sleep_minutes(1)
      n -= 1

This is ultimately because 'yield from func()' is equivalent to 't = func(); yield from t'. We don't care just how the generator object got to us so we can yield from it, we just care that it did.

However, at no stage in our processing functions can we attempt to look at the results of iterating sleep()'s generator object, either directly or indirectly by writing, say, 'for i in countdown(10):'. This rules out certain patterns for writing processing functions, for instance this one:

def label_each_sec(label, n):
   for _ in tick_once_per_sec(n):
      print(label)

This leads to the second rule, which is that we must have an unbroken chain of yield froms from the top to the bottom of our processing functions, right down to where you use an event function such as sleep(). Each function must 'call' the next using the 'yield from func()' idiom. In effect we don't have calls from one processing function to another; instead we're passing control from one function to the next. In my example, launch() passes control to countdown() until the countdown expires (and countdown() passes control to sleep()). If we actually call a processing function normally or accidentally use 'yield' instead of 'yield from', the entire collection explodes into various sorts of errors without getting off the launch pad and you will not go to space today.

As you might imagine, this is a little bit open to errors. Under normal circumstances you'll catch the errors fairly fast (when your main code doesn't work). However, since errors can only be caught at runtime when a non-yield from code path is reached, you may have mistakes that lurk in rarely executed code paths. Perhaps you have a rarely invoked last moment launch abort:

def launch(what, nsecs):
   print("counting down for", what)
   yield from countdown(nsecs)
   if launch_abort:
      print("Aborting launch! Clear the launch pad for", what)
      yield sleep(1)
      print("Flooding fire suppression ...")
   else:
      print("launching", what)

It might be a while before you discovered that mistake (I'm doing a certain amount of hand-waving about early aborts in countdown()).

(See also my somewhat related attempt at understanding this sort of thing in a Javascript context in Understanding how generators help asynchronous programming. Note that you can't use my particular approach from that entry in Python with 'yield from' for reasons beyond the scope of this entry.)

CoroutinesWithYieldFrom written at 00:32:50; Add Comment

2017-03-15

Sorting out Python generator functions and yield from in my head

Through a chain of reading, I wound up at How the heck does async/await work in Python 3.5? (via the Trio tutorial). As has happened before when I started reading about Python 3's new async and await stuff, my head started hurting when I hit the somewhat breezy discussion of yield from and I felt the need to slow down and try to solidly understand this, which I haven't really before.

Generator functions are functions that contain a yield statement:

def fred(a):
   r = yield 10
   print("fred got:", r)
   yield a

A straightforward generator function is there to produce a whole series of values without having to ever materialize all of them at once in a list or the like.

Calling a generator function does not return its result. Instead, it returns a generator object, which is a form of iterator:

>>> fred(10)
<generator object fred at 0x7f52a75ea1a8>

This generator object is in part a closure that captures the argument fred() was called with (and in general will preserve fred()'s state while it is repeatedly iterated). Note that fred()'s code doesn't start executing until you try to get the first value from the iterator.

One common pattern with a stack of generator functions (including needing to modify or filter part of a generator's results) was that you would have one generator function that wanted to call another one for a while. In the beginning this was done with explicit for loops and the like, but then Python added yield from. yield from takes a generator or iterator and exhausts it for you, repeatedly yield'ing the result.

def barney(a):
   yield from fred(a)

(You can intermix yield and yield from and use both of them more than once in a function.)

Because generator functions actually return generators, not any sort of result, 'yield from func()' is essentially syntactic sugar for calling the function, getting a generator object back, and then calling yield from on the generator object. There is no special magic involved in that:

def barney(a):
   gen = fred(a)
   yield from gen

Because generator objects are ordinary objects, they can be returned through functions that are not generators themselves, provided that intermediate functions don't really attempt to manipulate them and simply return them as-is:

def jim(a):
   return fred(a)

def bob(a):
   yield from jim(a)

(If jim() actually iterated through fred()'s results, things would quietly go sideways in ways that might or might not be visible.)

When yield started out, it was a statement; however, that got revised so that it was an expression and could thus have a value, as we see in fred() where the value of one yield is assigned to r and then used later. You (the holder of the generator object) inject that value by calling .send() on the generator:

>>> g = fred(2)
>>> _ = g.send(None); _ = g.send("a")
fred got: a

(The first .send() starts the generator running and must be made with a None argument.)

As part of adding yield from, Python arranged it so that if you had a stack of yield from invocations and you called .send() on the outer generator object, the value you sent did not go to the outer generator object; instead it goes all the way down to the eventual generator object that is doing a yield instead of a yield from.

def level3(a):
   # three levels of yield from
   # and we pass through a normal
   # functions too
   yield from bob(a)

>>> g = level3(10)
>>> _ = g.send(None); _ = g.send("down there")
fred got: down there

This means that if you have a stack of functions that all relay things back up using 'yield from', you have a direct path from your top level code (here that's our interactive code where we called level3()) all the way down to the core generator function at the bottom of the call stack (here, the fred() function). You and it can communicate with each other through the values it yields and the values you send() to it without any function in the middle having to understand anything about this; it's entirely transparent to them.

(Don't accidentally write 'yield' instead of 'yield from', though. The good news about that mistake is that you'll catch it fast.)

Hopefully writing this has anchored yield from's full behavior and the logic behind it sufficiently solidly in my head that it will actually stick this time around.

Sidebar: yield from versus yield of a generator

Suppose that we have a little mistake:

def barney2(a):
   yield fred(a)

What happens? Basically what you'd expect:

>>> list(barney(20))
fred got: None
[10, 20]
>>> list(barney2(20))
[<generator object fred at 0x7f0fdf4b9258>]

When we used yield instead of yield from, we returned a value instead of iterating through the generator. The value here is what we get as the result of calling fred(), which is a generator object.

By the way, a corollary to strings being iterable is that accidentally calling 'yield from' instead of 'yield' on a string won't fail the way that eg 'yield from 10' does but will instead give you a sequence of single characters. You'll probably notice that error fairly fast, though.

This behavior of yield from is pretty much a feature, because it means that you can yield from another function without having to care about whether it's an actual generator function or it merely returns an iterable object of some sort; either will work.

YieldFromAndGeneratorFunctions written at 01:05:27; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.