2008-05-26
Shimming modules for testing (and fun)
Suppose that you have a chunk of Python code that wants to properly map IP addresses to hostnames, and you want to test this code to make sure that it actually works (especially with unit tests). In order to do this you need to contrive for various IP address and hostname lookups to fail in various ways, and to do this on command.
The easy way to do this is to exploit Python's freedom by shimming (well, replacing) the gethostbyname()
and gethostbyaddr() functions in the socket module with
completely fake versions, for example simple functions that just consult
an internal table for the results they should return for various
lookups. You then test your IP to hostname mapping code against these
known fake IP addresses and make sure it returns correct results (since
you already know what each IP address should result in; you specified
it).
(Make sure that your test framework saves the original functions and puts them back into place after the test finishes; otherwise things may get very confused.)
Shimming ordinary module functions is usually a relatively
simple thing (another useful module to shim is the time module, if you are testing
time-dependent things). With more work you can shim entire classes, such
as socket.socket, so that code that creates its own sockets and does
things to them can be tested under completely controlled conditions.
(Watch out, though; it's easy for your shims to get overly complex and clever. I was probably there by the end of my unit testing fun. Also, remember to document what all of this clever testing code does, or you may have more excitement than you want in a year or so.)
Disclaimer: this is unlikely to be the officially TDD-approved way of unit testing this sort of stuff. But it has the two great virtues of not contorting your actual code and being relatively simple.
2008-05-01
What the co_names attribute on Python code objects is
As a trap for the unwary, Python code objects have both a co_names
and a co_varnames attribute. Since I just confused myself about
which was what the other day, here is what the
co_names one is.
Put simply, co_names is a tuple of names of globals and attributes
that are used by the function's code. For example, if you have 'a =
self.bar()' in the function, the 'bar' will show up in co_names,
as will the 'foo' from 'a = foo()'.
(Perhaps I should call these 'identifiers' instead of 'names'. In Ruby and Lisp and probably elsewhere these are called symbols.)
Ultimately this is part of how the CPython bytecode interpreter is
implemented. When the bytecode interpreter refers to anything but a
local variable, it has to do an attribute lookup
with the name to get the actual object involved. Rather than put the
name that's being looked up directly in the bytecode instructions,
CPython puts all the names into a table and has the instructions refer
to table slots, so the LOAD_GLOBAL instruction says 'look up name 3'
instead of 'look up "somevar"'. And co_names is that table (or at
least a representation of that table).
Each name only appears once in co_names, no matter how many times
it's used in your function and no matter if it's used in different
contexts; if you have both 'obj.foo' and 'foo()' in your code, there
will only be one "foo" in co_names, even though one use of the name
is for an object attribute and one is for a global. As far as I can tell
from reading CPython source, names are always interned strings and so
are globally unique.
(The co_names table slot numbers are of course a per-function thing;
slot 0 in different functions will refer to completely different names.)