Understanding a tricky bit of Python generators
From this Python quiz's third question, consider the following code:
units = [1, 2] tens = [10, 20] nums = (a + b for a in units for b in tens) units = [3, 4] tens = [30, 40] print nums.next()
This is a far more interesting quiz question than you might think, because what's going on is actually quite deep.
One of the things that gets people in Python is that it is in general
what I've called a 'late binding' language; when
you write an expression whose execution is deferred, the values of the
variables it uses are not immediately captured. Instead they will be
looked up when the code is actually executed. This shows up in the
quiz's first question, for
example. A straightforward interpretation of late binding might expect
the code here to print 33. Instead it prints 31; tens
is late binding,
referring to the current value, but units
has been bound immediately.
You may now be going 'say what?' So let me make your day a little bit more surreal:
units = [1, 2] tens = [10, 20] nums = (a + b for a in units for b in tens) units.pop(0) tens = [30, 40] print nums.next()
This prints 32
.
To explain this, let me quote from the language specification:
Variables used in the generator expression are evaluated lazily when the
__next__()
method is called for generator object (in the same fashion as normal generators). However, the leftmost for clause is immediately evaluated, [...]
However, 'evaluates' here does not mean what you might think. When
Python 'evaluates' units
in the 'for a in units
' clause, it doesn't
make a private copy of the list's value; instead, it creates an iterator
object from the list. This iterator object is what the for loop actually
loops over, and it internally has a reference to the original list that
units
was bound to.
The first version of this question rebinds units
but leaves the
original list (now accessible only through the iterator) unaltered.
nums.next()
thus uses the first element of the original list as a
.
The second version mutates the original list by deleting the first
element, so nums.next()
winds up using '2' as a
.
(In both cases the second for
loop is un-evaluated until the
generator begins operation, so it picks up the new binding for
tens
.)
I am in admiration of how deep this rabbit hole turned out to be once I actually started looking down it.
Sidebar: seeing this in the bytecode
To confirm what's happening, let's disassemble nums.gi_frame.f_code
,
the actual bytecode of the generator (I've somewhat simplified the
bytecode disassembly syntax):
0 SETUP_LOOP (to 42) 3 LOAD_FAST '.0' 6 FOR_ITER (to 41) 9 STORE_FAST 'a' 12 SETUP_LOOP (to 38) 15 LOAD_GLOBAL 'tens' 18 GET_ITER 19 FOR_ITER (to 37) [...]
As we can see here, there is no reference to units
; instead
we do a load of a local variable called .0
. If we check
nums.gi_frame.f_locals['.0']
, we'll see that this local variable is
a 'listiterator' object. By contrast, tens
is loaded explicitly as a
global and then immediately turned into an iterator so that it can be
looped over.
Note that you can get truly odd behavior by rebinding tens
after
you've called nums.next()
once (or otherwise after you've invoked the
generator once). This is because every time we go through the outer
loop, the current binding of tens
is re-captured into an iterator for
the inner loop. Further rebinding of tens
has a delayed effect; it
takes effect on the next pass around the outer loop.
(Mutation of the current binding has an immediate effect, but is then
lost if you've rebound tens
as well.)
PS: you can make similar crazy things happen in conventional for
loops, because the same object to iterator transformation is happening.
|
|