2012-10-18
A danger of default values for function arguments (in illustrated form)
Due to recent events, I've been working on a program to measure our disk IO latencies. Since it only needs timing accuracy in the millisecond range, I've been writing it in Python (which is more than fast enough to not add distortions to the IO timings). In the process of developing this code, I made a classic absent-minded mistake that shows a danger of default arguments.
The code needs to know the size of the range it will be doing IO on. In the beginning, it worked only on files and got the size from the size of the file, and the code looked something like:
def process(fname):
size = sizeof(fname)
[.....]
def main(args):
for a in args:
process(a)
Then I discovered that I needed to make the code work on raw disks
too. Getting the size of a raw disk is much more complicated than
getting the size of a file and anyways, they're huge and I didn't want
to do that much testing. I decided that clearly the thing to do was give
process() an optional argument to specify the size to work on and
revised the code to look like this:
def process(fname, size = None):
if size is None:
size = sizeof(fname)
[....]
def main(args):
size = None
[set 'size' from a -s switch]
for a in args:
process(a)
I then spent an embarrassing amount of time trying to figure out what was wrong with my code such that the IO offsets weren't being computed right (there were complicating factors, evidently including insufficient coffee).
The problem (as you could probably see immediately) is that I'd
forgotten to update the code in main() to actually pass the new size
argument to process(); I'd only added the code to optionally get it
from the command line. If I had not cleverly made size an argument
with a default, I would have discovered this mistake immediately because
Python would have reported an argument count mismatch. But using
default argument values hides argument count errors so when I left out
an argument that was in practice not optional I didn't get an error,
just an oddly broken program.
(I was lucky that things broke in a clear, directly observable way.)
This is not the first time I have revised function arguments this way. When I add a new argument to a function I always have a temptation to give it a default and make it optional; there's a little voice in the back of my head that says 'this is the right way to keep existing code working'. Very often this is clearly wrong for the same reason as it was here, namely that I'm going to immediately revise all of the callers to explicitly pass in this new 'optional' argument. If I'm always passing an argument, giving it a default value too is simply inviting argument count errors. I really should know better by now.
(There are cases where default values for arguments are really useful, but this is not one of them.)
2012-10-06
Python can execute zip files
One of my long-running little bits of unhappiness is that Python strongly encourages modular programming but makes it awkward to write little programs in a modular way. Modules have to be separate files and once you have multiple files you have two problems; the main program has to be able to find those modules to load them, and you have to distribute multiple files and install them somehow instead of just giving people a self-contained file and telling them 'run this'. I recently found that there is a (hacky) way around this, although it's probably not news to people who are more plugged into Python distribution issues than I am.
The first trick is that Python can 'run' directories. If you have
a directory with a file called __main__.py and you do python
<directory>, Python will run __main__.py. Note that it does so
directly, without importing the module; this has various awkward
consequences. It will also do something similar to this with 'python -m
<module>', but there the module must be on your Python search path and
it will be imported before <module>/__main__.py is executed.
The second trick is that Python will import things (ie load code) from
zipfiles, basically treating them as (encoded) directories; the exact
specifics of this are beyond the scope of this entry (see eg here). As an extension of the
first trick, Python will 'run' zipfiles as if they were directories; if
you do 'python foo.zip' and foo.zip contains __main__.py, it gets
run.
The third trick is that Python is smart enough to do this even when
the 'zipfile' has a '#! ....' line at the start. In fact Python is
willing to accept quite a lot of things before the actual zipfile;
experimentally, it will skip lines that start with '#', blank lines,
and lines that only have whitespace. In other words, you can take a
zipfile that's got your __main__.py plus associated support modules
and put a #!... line on the front to make it a standalone script (at
least on Unix).
Since Python supports it, I strongly suggest also adding a second line
with a '#' comment explaining what this peculiar thing is. That way
people who try to look at your Python program won't get completely
confused. Additional information is optional but possibly useful.
(I believe that all of this has been in Python for some time. I've just been slow to discover it, although I vaguely knew that Python could import code from zipfiles.)
Sidebar: zipfiles and byte-compilation
First off, as always (C)Python will only load
.pyc precompiled bytecode files when (and if) you import modules. Your
__main__.py will not have any bytecode version loaded so you want
to make it as small as possible. Second, Python
doesn't modify a zipfile when it imports code from it, which means that
if you don't include .pyc files in your zipfile CPython will compile all
your code to bytecode every time your program is run.
The solution is straightforward: run your program from its directory once (with some do-nothing arguments) before packing everything into a zipfile.
Note that this makes zipfiles somewhat less generic than you might like. CPython bytecode is specific to (roughly) the Python version, so eg Python 2.7 will not load bytecode generated by Python 2.6 and vice versa. Your zipfile program may run unchanged on both, but one may have a startup delay.