Wandering Thoughts archives

2012-10-18

A danger of default values for function arguments (in illustrated form)

Due to recent events, I've been working on a program to measure our disk IO latencies. Since it only needs timing accuracy in the millisecond range, I've been writing it in Python (which is more than fast enough to not add distortions to the IO timings). In the process of developing this code, I made a classic absent-minded mistake that shows a danger of default arguments.

The code needs to know the size of the range it will be doing IO on. In the beginning, it worked only on files and got the size from the size of the file, and the code looked something like:

def process(fname):
  size = sizeof(fname)
  [.....]

def main(args):
  for a in args:
    process(a)

Then I discovered that I needed to make the code work on raw disks too. Getting the size of a raw disk is much more complicated than getting the size of a file and anyways, they're huge and I didn't want to do that much testing. I decided that clearly the thing to do was give process() an optional argument to specify the size to work on and revised the code to look like this:

def process(fname, size = None):
  if size is None:
    size = sizeof(fname)
  [....]

def main(args):
  size = None
  [set 'size' from a -s switch]
  for a in args:
    process(a)

I then spent an embarrassing amount of time trying to figure out what was wrong with my code such that the IO offsets weren't being computed right (there were complicating factors, evidently including insufficient coffee).

The problem (as you could probably see immediately) is that I'd forgotten to update the code in main() to actually pass the new size argument to process(); I'd only added the code to optionally get it from the command line. If I had not cleverly made size an argument with a default, I would have discovered this mistake immediately because Python would have reported an argument count mismatch. But using default argument values hides argument count errors so when I left out an argument that was in practice not optional I didn't get an error, just an oddly broken program.

(I was lucky that things broke in a clear, directly observable way.)

This is not the first time I have revised function arguments this way. When I add a new argument to a function I always have a temptation to give it a default and make it optional; there's a little voice in the back of my head that says 'this is the right way to keep existing code working'. Very often this is clearly wrong for the same reason as it was here, namely that I'm going to immediately revise all of the callers to explicitly pass in this new 'optional' argument. If I'm always passing an argument, giving it a default value too is simply inviting argument count errors. I really should know better by now.

(There are cases where default values for arguments are really useful, but this is not one of them.)

DefaultArgumentDanger written at 01:33:29; Add Comment

2012-10-06

Python can execute zip files

One of my long-running little bits of unhappiness is that Python strongly encourages modular programming but makes it awkward to write little programs in a modular way. Modules have to be separate files and once you have multiple files you have two problems; the main program has to be able to find those modules to load them, and you have to distribute multiple files and install them somehow instead of just giving people a self-contained file and telling them 'run this'. I recently found that there is a (hacky) way around this, although it's probably not news to people who are more plugged into Python distribution issues than I am.

The first trick is that Python can 'run' directories. If you have a directory with a file called __main__.py and you do python <directory>, Python will run __main__.py. Note that it does so directly, without importing the module; this has various awkward consequences. It will also do something similar to this with 'python -m <module>', but there the module must be on your Python search path and it will be imported before <module>/__main__.py is executed.

The second trick is that Python will import things (ie load code) from zipfiles, basically treating them as (encoded) directories; the exact specifics of this are beyond the scope of this entry (see eg here). As an extension of the first trick, Python will 'run' zipfiles as if they were directories; if you do 'python foo.zip' and foo.zip contains __main__.py, it gets run.

The third trick is that Python is smart enough to do this even when the 'zipfile' has a '#! ....' line at the start. In fact Python is willing to accept quite a lot of things before the actual zipfile; experimentally, it will skip lines that start with '#', blank lines, and lines that only have whitespace. In other words, you can take a zipfile that's got your __main__.py plus associated support modules and put a #!... line on the front to make it a standalone script (at least on Unix).

Since Python supports it, I strongly suggest also adding a second line with a '#' comment explaining what this peculiar thing is. That way people who try to look at your Python program won't get completely confused. Additional information is optional but possibly useful.

(I believe that all of this has been in Python for some time. I've just been slow to discover it, although I vaguely knew that Python could import code from zipfiles.)

Sidebar: zipfiles and byte-compilation

First off, as always (C)Python will only load .pyc precompiled bytecode files when (and if) you import modules. Your __main__.py will not have any bytecode version loaded so you want to make it as small as possible. Second, Python doesn't modify a zipfile when it imports code from it, which means that if you don't include .pyc files in your zipfile CPython will compile all your code to bytecode every time your program is run.

The solution is straightforward: run your program from its directory once (with some do-nothing arguments) before packing everything into a zipfile.

Note that this makes zipfiles somewhat less generic than you might like. CPython bytecode is specific to (roughly) the Python version, so eg Python 2.7 will not load bytecode generated by Python 2.6 and vice versa. Your zipfile program may run unchanged on both, but one may have a startup delay.

RunningZipfiles written at 23:44:08; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.