Wandering Thoughts archives

2005-06-27

Dangerously over-broad error catching

One of the stock pieces of advice about writing Python code is that basic error checks should be handled by catching the exception that trying to do the operation generates, instead of through explicit checks beforehand. For example, if you are getting the value for a key from a dictionary (what some other languages call a hash), you don't bother checking to see if the key is in the dictionary beforehand; you just try to get the value.

This code might be written like this (especially in examples):

try:
    result = some_func(dict[key])
except KeyError:
    # key isn't in dict.
    result = None

Unfortunately, this code has a serious, somewhat subtle problem; the try: ... except clause is too broad.

You probably wanted result set to None just if dict[key] didn't exist. But with this code, it will also happen if some_func or anything it calls raises an otherwise uncaught KeyError exception. Such uncaught KeyError exceptions are probably real bugs that you want to fix, but this code is hiding them.

You should be doing as little as possible in try/except blocks that catch general errors, because that way you minimize the risk that you're accidentally sweeping unrelated errors under the carpet.

It follows that the worst version of this is to write a try/except block using 'except:' so that it catches all exceptions. Please don't be that lazy.

BroadTrys written at 00:18:11; Add Comment

2005-06-23

A real use for staticmethod

In Python, all method functions normally get called with the object as their first argument. New-style classes can specify that some methods should instead be 'static' methods: they don't receive the object as one of their arguments.

On the face of it this is a bit peculiar. Given that Python has ordinary (module-level) functions, why would you ever want to design a staticmethod instead (apart from misplaced object-oriented purity)?

What I use static methods for is as a way of having subclasses specify what piece of data from a data source a 'generic over a class of classes' method function will work on. The pattern looks like:

class Abstract(object):
   def frobnicate(self, datasource):
       work_on = self._pullfrom(datasource)
       # frobnicate work_on
   def fiddle(self, datasource):
       work_on = self._pullfrom(datasource)
       # fiddle with work_on
   # and so on

class RealA(Abstract):
   @staticmethod
   def _pullfrom(datasource):
       return datasource.thing_a()
class RealB(Abstract):
   @staticmethod
   def _pullfrom(datasource):
       return datasource.thing_b()

There are alternative patterns for this same effect, but none of them are as simple once the situation itself becomes more complex, such as needing more than one piece of information from the datasource.

StaticMethodUse written at 02:40:34; Add Comment

2005-06-20

Small details can matter (or: a little nifty Python bit)

It's common for Python routines that search for the position of something in a bigger thing (such as characters in a string or list entries in a list) to return -1 when they don't find anything.

I've written Python code for years without really thinking about this. Clearly the routines have to do something when they fail, whether it's raise an exception or return a marker value, and -1 is both simple and not a valid index into an object.

Only recently (while writing some DWiki code) did it occur to me that if what you want is 'everything after where this thing is, or the entire object if it's not there', a return value of -1 is perfect because you can just write:

	pos = thing.searchFor(whatever)
	tail = thing[pos+1:]

If whatever is found in the thing, this clearly works. If it's not found, pos is -1 and the whole expression becomes 'thing[0:]', which in Python is a synonym for all of thing (since Python objects are indexed from zero).

There are certainly other plausible 'not found' marker values, for example None or False. But then this little bit wouldn't work and you'd have to write more code.

There's another clever little property -1 has as an error return: it's considered true in Python, not false. This is useful, because 0 is considered false; that the 'not found' value is not false saves you from writing superficially simple code like:

	pos = someString.find(somechar)
	if not pos:
		# do something on failure

If the 'not found' marker was considered false, this code would work until the day somechar was found at position 0, which might be quite a while. Since it isn't, this code fails immediately and thus gets fixed immediately.

I usually think of the choice of what to return as the 'not found' value as something pretty simple and arbitrary; as long as it's an invalid value and easily checked, it's not something that matters much. But as I've just come to understand, sometimes the small details really can matter, because if they're done right they make your life simpler in little quiet ways.

SmallDetailsMatter written at 00:33:46; Add Comment

2005-06-16

Iterator & Generator Gotchas

Python iterators are objects (or functions, using some magic) that repeatedly produce values, one at a time, until they get exhausted. Python introduced this general feature to efficiently support things like:

for line in fp.readlines():
    ... do something with each line ...

Without iterators, .readlines() would have to read the entire file into memory, split it up into lines, and return a huge list; now, this code only has one line in memory at any given time, even if the file is tens or hundreds of megabytes.

Generators are functions that magically create iterators instead of just returning values (ignoring some technicalities). Generators are the most common gateway to iterators, and are thus the more commonly used term for the whole area.

When iterators were introduced, a number of standard things that had previously returned lists started returning iterators, and using a generator instead of just returning a list became part of the common Python programming idioms.

In many cases it can be tempting, and temptingly easy, to replace things that return lists with generators; it looks like it should just work, and it mostly does. It can be similarly tempting to just ignore the difference in the standard Python modules.

But there are some gotchas when you write code like this, and I have the stubbed toes to prove it. At one point or another, I've made all of these iterator-confusion mistakes in my code.

Iterators are always true

t = generate_list(some, inputs)
if not t:
   return
print "Header Line:"
for item in t:
   .....

If generate_list returns an iterator instead of a list, this code doesn't work right. Unless someone got quite fancy, iterator objects are always true, unlike lists, which are only true if they contain something.

There's really no way to see if an iterator contains anything except to try to get a value from it. And there's no 'push value back onto iterator' operation.

Iterators can't be saved

def cached_lookup(what):
  if what not in cache:
    cache[what] = real_lookup(what)
  return cache[what]

If real_lookup returns iterators, this code doesn't work. When an iterator's exhausted, it's exhausted; if you try to use it again (such as if cached_lookup found it as a cached result), it generates nothing.

(Technically I believe there are semi-magical ways to copy iterators. I suspect one is best off avoiding them unless you really have to save an iterator copy.)

I can't use list methods on iterators

t = generate_list(some, inputs)
t.sort()
t = t[:firstN]
# ... admire the pretty explosions

Of course, iterators don't have general list functions like .sort() (or .len(), or so on). If you want to use those functions, you have to write:

t = list(generate_list(some, inputs))
t.sort(); t = t[:firstN]

Fortunately, list() will expand the iterator for you and is harmless to apply to real lists, so you can use it without having to care if the generate_list routine changes what it returns.

Writing recursive generators

Sometimes the most natural structure for a generator is a recursive one. This works, but you have to bear in mind a twist: you cannot simply return the results of the recursive calls. This is because the recursive results are themselves iterators, and if you return them straight your callers get iterators that produce a stream of iterators that produce a stream of iterators that someday, at some level, produce actual results. (But by that time the caller has given up in despair.)

Instead each time you recurse, you have to expand the resulting iterator and return each result, like so:

def treewalk(node):
  if not node:
    return
  yield node.value
  for val in treewalk(node.left):
    yield val
  for val in treewalk(node.right):
    yield val

This implies that significantly recursive generators can be quite inefficient, as they will spend a great deal of time trickling results up through all the levels involved.

GeneratorGotchas written at 02:26:39; Add Comment

2005-06-14

Putting a pleasant Python surprise to use

Although I've been programming in Python for a few years now, it keeps surprising me with little bits and pieces. Here's a neat Python language feature that I recently used for the first time (discovered originally through Bram Cohen's LiveJournal).

A common programming pattern is 'search for a something to work on, but stop if you don't find anything'. In Python one might write it something like this (taken more or less from DWiki's source):

found = False
for dir in utils.walk_to_root(curdir):
	page = dir.child("__readme")
	if page.exists():
		found = True
		break
if not found:
	return ''
# Go on to use the __readme file we found in some directory.

Python allows you to put 'else' conditions on loops (both for and while loops); the else condition is executed if the loop completed instead of being break'd from. This lets us simplify this pattern down to:

for dir in utils.walk_to_root(curdir):
	page = dir.child("__readme")
	if page.exists():
		break
else:
	return ''

If there's no __readme file to be found from the current directory up to the root, we just return nothing; otherwise, we'll process it. This DWiki code is the first occasion I've had to use this feature since I discovered it, and I'm pleased to finally have been able to.

(As you can now see, not all the entries in this blog are going to be long and meandering.)

LoopElse written at 17:16:09; Add Comment

2005-06-12

Making a Python mountain out of a molehill

DWiki is the software that runs CSpace, including this blog. It's wound up much bigger than I expected and wanted it to be. This is sort of the story of how (or why) that happened.

Read more »

DWikiGrowth written at 03:22:32; Add Comment

By day for June 2005: 12 14 16 20 23 27; after June.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.