Wandering Thoughts archives

2008-08-28

Thinking about the importance of cross-implementation portability

Here is a Python question I have been mulling over since a comment on a previous entry: does code portability across different Python implementations matter very much at the moment?

My suspicion is that the answer is 'not really in practice', because I don't see much motivation for moving code between the different implementations:

  • the three leading implementations ('CPython' (the normal Python), Jython, and IronPython) exist in significantly different environments; you cannot really drop in one as a replacement for another, so you need code that can usefully move between different environments.

    (There is a class of code that can do this: network servers. But a lot of code will not move so easily, with the worst case probably being GUI-based programs.)

  • there is no obvious large Python program that people want to run across different environments; the one possible exception I can think of is Django, but I think that Django is not as influential and widely used as something like Ruby on Rails (which is driving interest in alternative Ruby implementations, Ruby on the JVM, and so on).

  • people might like to move important modules (well, to have them available), but my impression is that the really interesting modules are not pure Python (because of performance issues) and thus already need explicit porting.

(Also, right now if you are going to promise that your Python code is portable to other implementations, you probably need to explicitly test it in them. This may not be easy to do, although I am biased by working on Unix.)

All of this makes me suspect that not very much code moving is really happening and thus portability is really not an issue right now for almost all Python programmers.

CrossImplementationImportance written at 01:06:25; Add Comment

2008-08-18

The problem with using tuples and lists to hold structures

If you need to hold several bits of data about something in Python, it's awfully tempting to just put everything into a tuple or a list and be done with it; it's certainly the easiest way, and so crops up often.

(I've read that that Pythonic way to decide between a list and a tuple is whether or not the data is all the same type, in which case you use a tuple, or different types, in which case you use a list. I don't bother paying attention to this; I generally use a tuple if the data elements don't get changed and a list if they do.)

The problem with using lists and tuples instead of actual structures for this is that you are stuffing structured data into unstructured objects. The result is that the structure of the data only exists implicitly in your code, instead of explicitly in the objects. This is both harder to read and more prone to errors (especially since the structure of the data is going to be in more than one place, all of which had better agree).

All of this makes me think that I should be using some sort of structures in my Python code much more than I am now. Even if the idiom itself takes some explaining, I think that the overall code will be simpler and clearer.

(And the basic version is not all that much code, either.)

Update: I was wrong about the Python use of lists and tuples; it is lists that are for data that is all the same type, and tuples for data that is a different type. See the comments for details.

TupleListStructureProblem written at 22:32:23; Add Comment

2008-08-17

Thinking about the best way to handle command registration

Suppose that you have a Python utility program that does a number of related things; it's invoked as 'program command ....', and decides what to do based on what the command is. The obvious way to implement this in Python is to have a function for each command and a top-level dispatcher that looks up the function to call for each command and passes it the arguments to process. Clearly you don't want to hard-code this in the dispatcher function in a big if/elif block; instead you want to somehow register all the commands in a simpler, easier to maintain way.

Since I have been writing such a utility program, I have found myself wondering what the best way of doing this is (where 'best' means 'most natural, obvious, and easy to follow'). So far I've come up with a number of possible ways:

  • have a dictionary where the key is the command name and the value is a tuple of information about the command (including the function to call, the minimum and maximum number of command line arguments it takes, and some help text).

    (The obvious and simple approach, but it starts getting more and more awkward as you add more per-command data to keep track of. You can use a structure instead of a tuple, but then you are complicating the simple scheme.)

  • call a registration function to define a command, giving it all of the details:
    def cmdfunc(args, ...):
      <whatever>
    register('cmd', cmdfunc, ...)

    (The registration function can have defaults for various things.)

  • use a decorator function to do the registration:
    @register('cmd', ...)
    def cmdfunc(args, ...):
      <whatever>

  • attach all of the information to the function as function attributes, and have a simple scheme to relate the command to the function name. The top level dispatcher then uses introspection to find if the command exists and to pull out anything else it needs.

Naturally you can combine aspects of all of these together to create hybrid systems (for example, using the function's docstring for some of the user help text).

Of these, I suspect that the decorator function approach is the most Pythonic but is also the second hardest for people who are not experienced Python programmers to follow (introspection would probably be the hardest). Since that last issue is a consideration, I am leaning towards using the plain registration function approach.

(My current code mostly uses the simple dictionary approach, but it's not the clearest thing to follow.)

ConsideringCommandRegistration written at 23:38:03; Add Comment

2008-08-08

A workaround for the Python module search path issue on Unix

One of the little challenges with writing Unix programs in Python is the search path problem. The natural structure of Python programs is to split functionality up into a bunch of modules and then import them all in the main program, but the natural structure of a Unix program is to put its binary into one directory (such as /usr/bin) but all of its helper bits into a second, completely different directory. So the problem is: how is a Unix Python program supposed to find its modules?

(The normal Python search path for import is the directory that the program is being run from, such as /usr/bin or $HOME/bin, and the system Python package areas.)

The obvious solution is to have your program start off by adding its module directory to sys.path. The problem here is that this is an installation dependent location, which means that you're customizing your main program each time it gets installed. I dislike this and find it ugly, plus I maintain that overriding sys.path gets in the way of a number of things and can cause subtle problems.

It turns out that there is a simple workaround, hinted at by the aside there: put the Python main program into its library directory along with the rest of its modules, and turn the command that gets installed into the binary directory into a tiny shell script that is just:

#! /bin/sh
exec /where/ever/prog.py "$@"

This results in the library directory being added to the search path, because it is the directory that the actual Python program is being run from. (And you can be confidant that Python will know what that is, since the program is being started by absolute path.)

It is easy to create this tiny shell script as part of your installation process (when you're sure to know where the library directory is, since you are about to put things in it). As a bonus, your main program can still have a .py extension so that you can easily do things like check it with pychecker or import it into an interpreter to poke at something.

(Credit where credit is due: I didn't invent this trick. I believe I first saw it being used by a Python program on Fedora or one of the pre-Fedora Red Hat version.)

SearchPathWorkaround written at 00:23:47; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.