2009-06-15
try:/finally: and generators
Suppose that you have code that generates an abstract 'list' of some
sort and returns it inside a try:/finally: block. There are two common
ways to code this; you can return a real list, or you can use yield to
be a generator. You might even code it one way and change it to the other
later, which is generally a transparent change.
Not this time, though. If you are using finally:, the two options can
have quite different behavior. Constructing an example is tedious, but
explaining the issue is simple:
Using
yieldpostpones the execution of yourfinally:blocks from when your results are generated and returned to when your results are used, which may be some time later.
In simple situations you won't notice this, because the results are used
immediately after they're returned. In more complex situations you'll
probably get mysterious ordering issues about when your finally:
statements run, as they'll appear to run well after they 'should'
(and when they did when your function wasn't using yield).
(For example, consider a finally: that closes down a database
connection. If the result of your database lookup function is just put
in a data structure and only looked at later, you could build up a lot
more database connections than you expect.)
In fact this is one part of a general issue: when you use yield,
all of the objects and resources held alive by your function are only
released when your results are used. Effectively yield turns your
ordinary function into a closure, with the resulting consequences for
potential resource leaks.
(Credit where credit is due department: I was exposed to this issue by Using yield Statements in WSGI Middleware can be Very Harmful.)
2009-06-08
Another way that generators are not lists: modifying them
A long time ago, I wrote some stuff on how generators are not lists (okay, technically it was about iterators), and one of the things that I mentioned is that generators do not have list methods. Well, there's a consequence of that that only struck me recently: you need completely different code to modify a returned generator than to modify a returned list.
Suppose you have a function that returns something that is conceptually a list of items. Further suppose that you have another function that modifies what the first function returns; perhaps you want to add something on the end. If you know you're dealing with a list, you write:
def append(func, extra):
r = func()
r.extend(extra)
return r
If func() is a generator, this code blows up. You have two choices;
first, you can forcefully turn the result of func() into a list, and
second, you can rewrite append() as a generator (which will work
regardless of what func() returns, but may have consequences that
make it undesirable):
def append(func, extra):
for it in (func(), extra):
for e in it:
yield e
(Yes, yes, one can write this using
itertools.chain(). Then people would have to look it up.)
In either case, you have to actively make a decision about what your
function will do. You cannot passively modify whatever you get handed
and pass it up to your caller without changing its nature; you must
decide that no matter what func() returns, you're either returning a
list or an iterator.
(Technically you can, since you can see if you got handed something that follows the iterator protocol or whether it looks sequence-like. But that way lies madness.)