== Iterator & Generator Gotchas Python iterators are objects (or functions, using some magic) that repeatedly produce values, one at a time, until they get exhausted. Python introduced this general feature to efficiently support things like: > for line in fp.readlines(): > ... do something with each line ... Without iterators, _.readlines()_ would have to read the entire file into memory, split it up into lines, and return a huge list; now, this code only has one line in memory at any given time, even if the file is tens or hundreds of megabytes. Generators are functions that magically create iterators instead of just returning values (ignoring some technicalities). Generators are the most common gateway to iterators, and are thus the more commonly used term for the whole area. When iterators were introduced, a number of standard things that had previously returned lists started returning iterators, and using a generator instead of just returning a list became part of the common Python programming idioms. In many cases it can be tempting, and temptingly easy, to replace things that return lists with generators; it *looks* like it should just work, and it mostly does. It can be similarly tempting to just ignore the difference in the standard Python modules. But there are some gotchas when you write code like this, and I have the stubbed toes to prove it. At one point or another, I've made all of these iterator-confusion mistakes in my code. === Iterators are always true > t = generate_list(some, inputs) > if not t: > return > print "Header Line:" > for item in t: > ..... If ((generate_list)) returns an iterator instead of a list, this code doesn't work right. Unless someone got quite fancy, iterator objects are always true, unlike lists, which are only true if they contain something. There's really no way to see if an iterator contains anything except to try to get a value from it. And there's no 'push value back onto iterator' operation. === Iterators can't be saved > def cached_lookup(what): > if what not in cache: > cache[what] = real_lookup(what) > return cache[what] If ((real_lookup)) returns iterators, this code doesn't work. When an iterator's exhausted, it's exhausted; if you try to use it again (such as if ((cached_lookup)) found it as a cached result), it generates nothing. (Technically I believe there are semi-magical ways to copy iterators. I suspect one is best off avoiding them unless you *really* have to save an iterator copy.) === I can't use list methods on iterators > t = generate_list(some, inputs) > t.sort() > t = t[:firstN] > # ... admire the pretty explosions Of course, iterators don't have general list functions like _.sort()_ (or _.len()_, or so on). If you want to use those functions, you have to write: > t = list(generate_list(some, inputs)) > t.sort(); t = t[:firstN] Fortunately, _list()_ will expand the iterator for you *and* is harmless to apply to real lists, so you can use it without having to care if the ((generate_list)) routine changes what it returns. === Writing recursive generators Sometimes the most natural structure for a generator is a recursive one. This works, but you have to bear in mind a twist: you cannot simply return the results of the recursive calls. This is because the recursive results are themselves iterators, and if you return them straight your callers get iterators that produce a stream of iterators that produce a stream of iterators that someday, at some level, produce actual *results*. (But by that time the caller has given up in despair.) Instead each time you recurse, you have to expand the resulting iterator and return each result, like so: > def treewalk(node): > if not node: > return > yield node.value > for val in treewalk(node.left): > yield val > for val in treewalk(node.right): > yield val This implies that significantly recursive generators can be quite inefficient, as they will spend a great deal of time trickling results up through all the levels involved.