2008-08-28
Thinking about the importance of cross-implementation portability
Here is a Python question I have been mulling over since a comment on a previous entry: does code portability across different Python implementations matter very much at the moment?
My suspicion is that the answer is 'not really in practice', because I don't see much motivation for moving code between the different implementations:
- the three leading implementations ('CPython' (the normal Python),
Jython, and IronPython) exist in significantly different environments;
you cannot really drop in one as a replacement for another, so
you need code that can usefully move between different environments.
(There is a class of code that can do this: network servers. But a lot of code will not move so easily, with the worst case probably being GUI-based programs.)
- there is no obvious large Python program that people want to run
across different environments; the one possible exception I can
think of is Django, but I think that Django is not as influential
and widely used as something like Ruby on Rails (which is driving
interest in alternative Ruby implementations, Ruby on the JVM, and
so on).
- people might like to move important modules (well, to have them available), but my impression is that the really interesting modules are not pure Python (because of performance issues) and thus already need explicit porting.
(Also, right now if you are going to promise that your Python code is portable to other implementations, you probably need to explicitly test it in them. This may not be easy to do, although I am biased by working on Unix.)
All of this makes me suspect that not very much code moving is really happening and thus portability is really not an issue right now for almost all Python programmers.
2008-08-18
The problem with using tuples and lists to hold structures
If you need to hold several bits of data about something in Python, it's awfully tempting to just put everything into a tuple or a list and be done with it; it's certainly the easiest way, and so crops up often.
(I've read that that Pythonic way to decide between a list and a tuple is whether or not the data is all the same type, in which case you use a tuple, or different types, in which case you use a list. I don't bother paying attention to this; I generally use a tuple if the data elements don't get changed and a list if they do.)
The problem with using lists and tuples instead of actual structures for this is that you are stuffing structured data into unstructured objects. The result is that the structure of the data only exists implicitly in your code, instead of explicitly in the objects. This is both harder to read and more prone to errors (especially since the structure of the data is going to be in more than one place, all of which had better agree).
All of this makes me think that I should be using some sort of structures in my Python code much more than I am now. Even if the idiom itself takes some explaining, I think that the overall code will be simpler and clearer.
(And the basic version is not all that much code, either.)
Update: I was wrong about the Python use of lists and tuples; it is lists that are for data that is all the same type, and tuples for data that is a different type. See the comments for details.
2008-08-17
Thinking about the best way to handle command registration
Suppose that you have a Python utility program that does a number of
related things; it's invoked as 'program command ....', and decides
what to do based on what the command is. The obvious way to implement
this in Python is to have a function for each command and a top-level
dispatcher that looks up the function to call for each command and
passes it the arguments to process. Clearly you don't want to hard-code
this in the dispatcher function in a big if/elif block; instead
you want to somehow register all the commands in a simpler, easier to
maintain way.
Since I have been writing such a utility program, I have found myself wondering what the best way of doing this is (where 'best' means 'most natural, obvious, and easy to follow'). So far I've come up with a number of possible ways:
- have a dictionary where the key is the command name and the value
is a tuple of information about the command (including the function
to call, the minimum and maximum number of command line arguments
it takes, and some help text).
(The obvious and simple approach, but it starts getting more and more awkward as you add more per-command data to keep track of. You can use a structure instead of a tuple, but then you are complicating the simple scheme.)
- call a registration function to define a command, giving it all
of the details:
def cmdfunc(args, ...):
<whatever>
register('cmd', cmdfunc, ...)(The registration function can have defaults for various things.)
- use a decorator function to do the registration:
@register('cmd', ...)
def cmdfunc(args, ...):
<whatever> - attach all of the information to the function as function attributes, and have a simple scheme to relate the command to the function name. The top level dispatcher then uses introspection to find if the command exists and to pull out anything else it needs.
Naturally you can combine aspects of all of these together to create hybrid systems (for example, using the function's docstring for some of the user help text).
Of these, I suspect that the decorator function approach is the most Pythonic but is also the second hardest for people who are not experienced Python programmers to follow (introspection would probably be the hardest). Since that last issue is a consideration, I am leaning towards using the plain registration function approach.
(My current code mostly uses the simple dictionary approach, but it's not the clearest thing to follow.)
2008-08-08
A workaround for the Python module search path issue on Unix
One of the little challenges with writing Unix programs in Python is
the search path problem. The natural structure of Python programs is to
split functionality up into a bunch of modules and then import them
all in the main program, but the natural structure of a Unix program is
to put its binary into one directory (such as /usr/bin) but all of
its helper bits into a second, completely different directory. So the
problem is: how is a Unix Python program supposed to find its modules?
(The normal Python search path for import is the directory that the
program is being run from, such as /usr/bin or $HOME/bin, and the
system Python package areas.)
The obvious solution is to have your program start off by adding its
module directory to sys.path. The problem here is that this is an
installation dependent location, which means that you're customizing
your main program each time it gets installed. I dislike this and find
it ugly, plus I maintain that overriding sys.path gets in the way of a
number of things and can cause subtle problems.
It turns out that there is a simple workaround, hinted at by the aside there: put the Python main program into its library directory along with the rest of its modules, and turn the command that gets installed into the binary directory into a tiny shell script that is just:
#! /bin/sh
exec /where/ever/prog.py "$@"
This results in the library directory being added to the search path, because it is the directory that the actual Python program is being run from. (And you can be confidant that Python will know what that is, since the program is being started by absolute path.)
It is easy to create this tiny shell script as part of your installation
process (when you're sure to know where the library directory is, since
you are about to put things in it). As a bonus, your main program can
still have a .py extension so that you can easily do things like check
it with pychecker or import it
into an interpreter to poke at something.
(Credit where credit is due: I didn't invent this trick. I believe I first saw it being used by a Python program on Fedora or one of the pre-Fedora Red Hat version.)