2012-09-19
Two views of the argparse module
A commentator on my earlier entry about the options problem asked what I thought about the argparse module. I previously hadn't looked at it because of Python version issues, but their question prompted me to go read the documentation and form some opinions.
There are two different views of how command line parsing libraries
and modules should behave. In the first view, part of the job of these
libraries is to be opinionated and to only support what (Unix) people
have decided are good practices for handling command line arguments.
The venerable getopt
module is opinionated in this way. If you use getopt you get quite a
number of good practices behaviors and you have no choice about it. One
of the advantages of this is uniformity; if a program uses getopt,
you know that it will accept switches in any order, that you can write
either '-fARG' or '-f ARG', and so on. If you want to break the
rules and parse your arguments in a significantly different way, an
opinionated module will not help you. One result of this is that there
are perfectly good traditional Unix commands that can't parse their
command lines with getopt because they have requirements that it
doesn't support.
The second view of command line parsing libraries is that their job is merely helping you parse command line arguments in whatever way you want to do so. While they may make it easier to follow conventions they will not require that you do and will support as much wackiness as they can fit in a sensible API. Order-dependent switches? Optional arguments that take more than one parameter? Sure, whatever, the library will support you doing that. In their flexibility these libraries can be used to enable various levels of bad command line argument handling, argument handling that is completely at odds with how people expect things to behave. On the other hand they can be used to parse command lines that have various sorts of non-traditional behavior (depending on how clever and complex the library API is), stuff that an opinionated library will not deign to touch, instead of forcing you to do it all by hand (and likely doing a worse, less flexible job).
Argparse is the second sort of command line parsing module. It can be used to assemble command line parsers that behave in ways that I do not like at all and have violent reactions to, which strongly biases me against argparse's mere presence in the Python standard library and leaves me feeling that sure, argparse is more flexible than optparse but the flexibility is generally not an improvement.
(Note that optparse states up front that it is an opinionated module; see this.)
As a result of this and the Python version issue, I don't expect to use argparse any time soon. People who have a different view on flexible argument parsing versus standardized argument parsing are likely to have a different view on argparse versus optparse.
2012-09-18
My Python versions
A commentator on my previous entry about the options problem asked what I thought about the argparse module. One of the answers to that is that I don't have any opinions on it because so far it has been pointless to look at it since it only exists on Python 2.7+. Some readers may be tempted to roll their eyes at me now; after all, Python 2.7 was released in 2010 and is now just over two years old. Surely it is everywhere by now, right?
The short answer is 'not so much', so let me tell you about my Python versions. Right now, I work with Python on all three currently supported Ubuntu LTS versions, on Fedora Linux, on Red Hat Enterprise 5, on Solaris 10, and on some machines running some not exactly current but very stable versions of FreeBSD. Debian Linux is also worth considering. I would like to write Python programs that have some decent chance of running on all of these and some of my programs absolutely have to.
The best environment is Fedora; both currently supported versions of Fedora have Python 2.7.3. If I only cared about this I could use all of the modern Python 2 features that I wanted to, but unfortunately Fedora is generally the least of my concern.
Currently the supported Ubuntu LTS versions are 8.04, 10.04, and the recently released 12.04; these have Python 2.5.2, Python 2.6.5, and Python 2.7.3 respectively. Our own migration to 12.04 has only started very recently and many of the current 10.04 machines are likely to stay that way until the next LTS release comes out and forces the issue. So I already can't use 2.7+ features in our main environment and I can barely use 2.6+ ones.
Solaris and Red Hat Enterprise 5 are pretty hopeless on this front; Solaris 10 has Python 2.4.6 and RHEL 5 has 2.4.3. RHEL 5 is actually pretty important for one of my programs since it's what we use on our iSCSI backends. It looks like there's a version of Python 2.6 in EPEL for RHEL 5 and in OpenCSW for Solaris, but I don't see a version of 2.7 for either. Besides, using an unofficial Python just for something like argparse is a very hard sell.
(On the good side I'm unlikely to write any substantial Python code for RHEL 5 or especially for Solaris 10.)
The FreeBSD machine that currently hosts this blog is running FreeBSD 6.4 with Python 2.5.4; DWiki had clearly better keep running on such an old Python version. (Another FreeBSD machine is running FreeBSD 8.2 with Python 2.6.2.)
The Debian stable 'python' packages is Python 2.6.6. There doesn't seem to be a version of Python 2.7 packaged for stable; you have to go up to Debian testing for that. I'm actually a little bit surprised that there is no 'python2.7' package for Debian stable; stable already has Python 2.5 as well (so it has the infrastructure to deal with multiple Python versions) and it was first released about six months after Python 2.7.
The overall thing to take away from this is that old versions of Python live on for quite a while out here in the general world. Most machines are not on current operating systems and current operating systems only barely got Python 2.7, if they have at all. Really you're doing well to have Python 2.6+ available and you don't always.
2012-09-17
The options problem in Python
Suppose that you have a Python program that takes command line arguments, including switches (a 'verbose' switch is common, for example). These options change the program's behavior and logic in relatively low-level places, possibly pervasively (again a 'verbose' switch is a good example, as is a 'dryrun' switch).
So, how do you pass information about these command-line arguments down to low-level code? I can think of at least four, most of which I've used in my code from time time, and I have no idea which is considered the best and most Pythonic. The four that come to mind are:
- global variables for each option or setting. This involves a profusion
of global variables, which doesn't make me happy.
- a single global 'options' object, which holds all of the options and
settings (probably in expanded form).
- passing individual variables for each option down the call chain to
routines that need them. The problem with this is that you wind
up with large argument lists that are passed to a lot of high
level functions purely so that the functions can pass them along
to low level routes as needed.
- passing a single 'options' object down the call chain to routines that need any option. This at least adds fewer parameters to calls, but you pretty soon wind up in a situation where your options object is effectively a global variable that's being passed as part of the function arguments.
(Let's assume for the moment that you can't restructure your program in any natural way to localize knowledge about a particular option in a single small place; you really do have pervasive options.)
I don't really like any of these, although in some situations some of them are less annoying than others. Since I continue to not like global variables, right now I usually do some variant of the third or the fourth without any real enthusiasm for it.
I feel like there should be a better way, if only I was clever enough to see it.
(All of this is on my mind right now because I'm confronting one of my old programs with what is now bad argument handling and thinking about overhauling all of it.)
2012-09-09
When I've interned Python strings
One day, I read the following on reddit's r/python:
Is manually interning a string every a good idea? I'm having trouble thinking of a use case where the cost is justified outside the compilation cycle.
and a reply:
I've yet to run into a situation where it was, but I guess it could be in a very restricted number of situations, such as a small set of non-literal strings generated over and over again (e.g. some sort of parser), interning could reduce memory pressure.
This is exactly the case that I ran into at one point, with one of my Python daemons which was being used in a fairly demanding situation where I wanted to minimize the memory usage and memory churn over time. My program had three important features for having this make sense: it had big files to parse, there was a lot of repeated text in the files (text that had to be saved), and the files were changed a bit and reloaded on a relatively frequent basis.
Interning repeated text is an obvious win for memory usage, if you have a decent amount of it (measuring helps to know this). In my situation it also helped avoid memory churn during reloads of the files. When you reload a configuration file, the usual case is that almost all of the text is the same as it was the last time; this creates a lot of text and string duplication as you re-parse the file and get the same results as last time for most of it. Interning strings here insures that you do not create a boatload of new strings every time you reload the configuration file (and discard a boatload of old ones); instead you're likely to create only a few new ones and discard a few old ones.
Of course all of this care and interning may be a micro-optimization that doesn't make any difference in your actual circumstance. Interning strings is a performance optimization, so like any other optimization you should measure it to see if it gives you any benefits.
(For my program in our specific situation, it was one of a number of things that did make a visible difference.)