Iterator & Generator Gotchas

June 16, 2005

Python iterators are objects (or functions, using some magic) that repeatedly produce values, one at a time, until they get exhausted. Python introduced this general feature to efficiently support things like:

for line in fp.readlines():
    ... do something with each line ...

Without iterators, .readlines() would have to read the entire file into memory, split it up into lines, and return a huge list; now, this code only has one line in memory at any given time, even if the file is tens or hundreds of megabytes.

Generators are functions that magically create iterators instead of just returning values (ignoring some technicalities). Generators are the most common gateway to iterators, and are thus the more commonly used term for the whole area.

When iterators were introduced, a number of standard things that had previously returned lists started returning iterators, and using a generator instead of just returning a list became part of the common Python programming idioms.

In many cases it can be tempting, and temptingly easy, to replace things that return lists with generators; it looks like it should just work, and it mostly does. It can be similarly tempting to just ignore the difference in the standard Python modules.

But there are some gotchas when you write code like this, and I have the stubbed toes to prove it. At one point or another, I've made all of these iterator-confusion mistakes in my code.

Iterators are always true

t = generate_list(some, inputs)
if not t:
   return
print "Header Line:"
for item in t:
   .....

If generate_list returns an iterator instead of a list, this code doesn't work right. Unless someone got quite fancy, iterator objects are always true, unlike lists, which are only true if they contain something.

There's really no way to see if an iterator contains anything except to try to get a value from it. And there's no 'push value back onto iterator' operation.

Iterators can't be saved

def cached_lookup(what):
  if what not in cache:
    cache[what] = real_lookup(what)
  return cache[what]

If real_lookup returns iterators, this code doesn't work. When an iterator's exhausted, it's exhausted; if you try to use it again (such as if cached_lookup found it as a cached result), it generates nothing.

(Technically I believe there are semi-magical ways to copy iterators. I suspect one is best off avoiding them unless you really have to save an iterator copy.)

I can't use list methods on iterators

t = generate_list(some, inputs)
t.sort()
t = t[:firstN]
# ... admire the pretty explosions

Of course, iterators don't have general list functions like .sort() (or .len(), or so on). If you want to use those functions, you have to write:

t = list(generate_list(some, inputs))
t.sort(); t = t[:firstN]

Fortunately, list() will expand the iterator for you and is harmless to apply to real lists, so you can use it without having to care if the generate_list routine changes what it returns.

Writing recursive generators

Sometimes the most natural structure for a generator is a recursive one. This works, but you have to bear in mind a twist: you cannot simply return the results of the recursive calls. This is because the recursive results are themselves iterators, and if you return them straight your callers get iterators that produce a stream of iterators that produce a stream of iterators that someday, at some level, produce actual results. (But by that time the caller has given up in despair.)

Instead each time you recurse, you have to expand the resulting iterator and return each result, like so:

def treewalk(node):
  if not node:
    return
  yield node.value
  for val in treewalk(node.left):
    yield val
  for val in treewalk(node.right):
    yield val

This implies that significantly recursive generators can be quite inefficient, as they will spend a great deal of time trickling results up through all the levels involved.

Written on 16 June 2005.
« Putting a pleasant Python surprise to use
AJAX vs Dialups »

Page tools: View Source.
Search:
Login: Password:

Last modified: Thu Jun 16 02:26:39 2005
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.