2005-06-27
Dangerously over-broad error catching
One of the stock pieces of advice about writing Python code is that basic error checks should be handled by catching the exception that trying to do the operation generates, instead of through explicit checks beforehand. For example, if you are getting the value for a key from a dictionary (what some other languages call a hash), you don't bother checking to see if the key is in the dictionary beforehand; you just try to get the value.
This code might be written like this (especially in examples):
try:
result = some_func(dict[key])
except KeyError:
# key isn't in dict.
result = None
Unfortunately, this code has a serious, somewhat subtle problem;
the try: ... except clause is too broad.
You probably wanted result set to None just if dict[key] didn't
exist. But with this code, it will also happen if some_func or
anything it calls raises an otherwise uncaught KeyError
exception. Such uncaught KeyError exceptions are probably real bugs
that you want to fix, but this code is hiding them.
You should be doing as little as possible in try/except blocks that catch general errors, because that way you minimize the risk that you're accidentally sweeping unrelated errors under the carpet.
It follows that the worst version of this is to write a try/except
block using 'except:' so that it catches all exceptions. Please
don't be that lazy.
2005-06-23
A real use for staticmethod
In Python, all method functions normally get called with the object as their first argument. New-style classes can specify that some methods should instead be 'static' methods: they don't receive the object as one of their arguments.
On the face of it this is a bit peculiar. Given that Python has ordinary (module-level) functions, why would you ever want to design a staticmethod instead (apart from misplaced object-oriented purity)?
What I use static methods for is as a way of having subclasses specify what piece of data from a data source a 'generic over a class of classes' method function will work on. The pattern looks like:
class Abstract(object):
def frobnicate(self, datasource):
work_on = self._pullfrom(datasource)
# frobnicate work_on
def fiddle(self, datasource):
work_on = self._pullfrom(datasource)
# fiddle with work_on
# and so on
class RealA(Abstract):
@staticmethod
def _pullfrom(datasource):
return datasource.thing_a()
class RealB(Abstract):
@staticmethod
def _pullfrom(datasource):
return datasource.thing_b()
There are alternative patterns for this same effect, but none of them are as simple once the situation itself becomes more complex, such as needing more than one piece of information from the datasource.
2005-06-20
Small details can matter (or: a little nifty Python bit)
It's common for Python routines that search for the position of something in a bigger thing (such as characters in a string or list entries in a list) to return -1 when they don't find anything.
I've written Python code for years without really thinking about this. Clearly the routines have to do something when they fail, whether it's raise an exception or return a marker value, and -1 is both simple and not a valid index into an object.
Only recently (while writing some DWiki code) did it occur to me that if what you want is 'everything after where this thing is, or the entire object if it's not there', a return value of -1 is perfect because you can just write:
pos = thing.searchFor(whatever) tail = thing[pos+1:]
If whatever is found in the thing, this clearly works. If it's not
found, pos is -1 and the whole expression becomes 'thing[0:]',
which in Python is a synonym for all of thing (since Python objects
are indexed from zero).
There are certainly other plausible 'not found' marker values, for
example None or False. But then this little bit wouldn't work
and you'd have to write more code.
There's another clever little property -1 has as an error return: it's considered true in Python, not false. This is useful, because 0 is considered false; that the 'not found' value is not false saves you from writing superficially simple code like:
pos = someString.find(somechar) if not pos: # do something on failure
If the 'not found' marker was considered false, this code would work
until the day somechar was found at position 0, which might be quite
a while. Since it isn't, this code fails immediately and thus gets
fixed immediately.
I usually think of the choice of what to return as the 'not found' value as something pretty simple and arbitrary; as long as it's an invalid value and easily checked, it's not something that matters much. But as I've just come to understand, sometimes the small details really can matter, because if they're done right they make your life simpler in little quiet ways.
2005-06-16
Iterator & Generator Gotchas
Python iterators are objects (or functions, using some magic) that repeatedly produce values, one at a time, until they get exhausted. Python introduced this general feature to efficiently support things like:
for line in fp.readlines():
... do something with each line ...
Without iterators, .readlines() would have to read the entire file
into memory, split it up into lines, and return a huge list; now, this
code only has one line in memory at any given time, even if the file
is tens or hundreds of megabytes.
Generators are functions that magically create iterators instead of just returning values (ignoring some technicalities). Generators are the most common gateway to iterators, and are thus the more commonly used term for the whole area.
When iterators were introduced, a number of standard things that had previously returned lists started returning iterators, and using a generator instead of just returning a list became part of the common Python programming idioms.
In many cases it can be tempting, and temptingly easy, to replace things that return lists with generators; it looks like it should just work, and it mostly does. It can be similarly tempting to just ignore the difference in the standard Python modules.
But there are some gotchas when you write code like this, and I have the stubbed toes to prove it. At one point or another, I've made all of these iterator-confusion mistakes in my code.
Iterators are always true
t = generate_list(some, inputs) if not t: return print "Header Line:" for item in t: .....
If generate_list returns an iterator instead of a list, this code
doesn't work right. Unless someone got quite fancy, iterator objects
are always true, unlike lists, which are only true if they contain
something.
There's really no way to see if an iterator contains anything except to try to get a value from it. And there's no 'push value back onto iterator' operation.
Iterators can't be saved
def cached_lookup(what):
if what not in cache:
cache[what] = real_lookup(what)
return cache[what]
If real_lookup returns iterators, this code doesn't work.
When an iterator's exhausted, it's exhausted; if you try to use it
again (such as if cached_lookup found it as a cached result), it
generates nothing.
(Technically I believe there are semi-magical ways to copy iterators. I suspect one is best off avoiding them unless you really have to save an iterator copy.)
I can't use list methods on iterators
t = generate_list(some, inputs) t.sort() t = t[:firstN] # ... admire the pretty explosions
Of course, iterators don't have general list functions like .sort()
(or .len(), or so on). If you want to use those functions, you have
to write:
t = list(generate_list(some, inputs)) t.sort(); t = t[:firstN]
Fortunately, list() will expand the iterator for you and is
harmless to apply to real lists, so you can use it without having to
care if the generate_list routine changes what it returns.
Writing recursive generators
Sometimes the most natural structure for a generator is a recursive one. This works, but you have to bear in mind a twist: you cannot simply return the results of the recursive calls. This is because the recursive results are themselves iterators, and if you return them straight your callers get iterators that produce a stream of iterators that produce a stream of iterators that someday, at some level, produce actual results. (But by that time the caller has given up in despair.)
Instead each time you recurse, you have to expand the resulting iterator and return each result, like so:
def treewalk(node):
if not node:
return
yield node.value
for val in treewalk(node.left):
yield val
for val in treewalk(node.right):
yield val
This implies that significantly recursive generators can be quite inefficient, as they will spend a great deal of time trickling results up through all the levels involved.
2005-06-14
Putting a pleasant Python surprise to use
Although I've been programming in Python for a few years now, it keeps surprising me with little bits and pieces. Here's a neat Python language feature that I recently used for the first time (discovered originally through Bram Cohen's LiveJournal).
A common programming pattern is 'search for a something to work on, but stop if you don't find anything'. In Python one might write it something like this (taken more or less from DWiki's source):
found = False
for dir in utils.walk_to_root(curdir):
page = dir.child("__readme")
if page.exists():
found = True
break
if not found:
return ''
# Go on to use the __readme file we found in some directory.
Python allows you to put 'else' conditions on loops (both for and
while loops); the else condition is executed if the loop completed
instead of being break'd from. This lets us simplify this pattern down
to:
for dir in utils.walk_to_root(curdir):
page = dir.child("__readme")
if page.exists():
break
else:
return ''
If there's no __readme file to be found from the current directory
up to the root, we just return nothing; otherwise, we'll process it.
This DWiki code is the first occasion I've had to use this feature since
I discovered it, and I'm pleased to finally have been able to.
(As you can now see, not all the entries in this blog are going to be long and meandering.)
2005-06-12
Making a Python mountain out of a molehill
DWiki is the software that runs CSpace, including this blog. It's wound up much bigger than I expected and wanted it to be. This is sort of the story of how (or why) that happened.