2010-08-30
My avoidance of Python global variables
I spent part of today writing a quick one-off data conversion program. The core of it was a function that filtered items from a list through a number of things in order to sort them into the right category. Once the dust settled on all of the sorting needed, the function had quite a lot of stock arguments, things that didn't vary from call to call in my program. In fact, an unwieldy number of them.
There are at least three vaguely Pythonic options for how to deal with this (plus how I actually did), but what interests me in retrospect is the one answer that I didn't even think about. Namely, global variables.
There are all sorts of reasons to avoid global variables in general, but this was a one-off program and if I'm being honest, that's what all of those stock parameters really were. I was making them local variables in the calling function and then passing them in to the classifying function not so much because it was a good idea but because that's what I do in Python. I just don't use global variables very much even when they'd arguably make sense, and when I do use them I feel irritated.
As best I can tell, what does it is the pesky global keyword. Having
to declare variables global any time I want to rebind them adds just
enough extra friction to using global variables in practice that I would
rather not bother and instead pass lots of things around as parameters.
I generally resort to global variables only when passing the same
information as parameters would add arguments to too many layers of
function calls.
(This is the situation where you have four or five layers of function calls and some of the stuff down at the leaves wants to gather some expensive piece of information only once. The nominally logical thing to do is to call the 'gather information' function once at the start of your program and then pass the parameter all the way down to the leaves, but that means you have to pass the information object through all of the intermediate layers, where all it does is clutter up parameter lists. Really, you want to put it in a global variable, especially if you have several different clusters of these functions that want different chunks of information; passing the information they need down as parameters doesn't scale.)
Part of the friction is the annoyance of the extra line in any function
that will rebind the global variable. But another part is just having to
think about it at all, partly because I sort of consider global to be
a wart (especially because I know what the bytecode is doing behind
the scenes).
(Global's not really a wart, but that's another entry.)
Sidebar: the three options that I am thinking of
The three Python options that immediately come to mind are:
- embed the classifying function into its caller as a closure, giving
it direct access to all of what used to be the stock parameters.
This feels like a hack to me, and I don't like the extra level
of indentation.
- make the classifying function a method on a class which otherwise
had all of the stock parameters as instance variables. It's
probably the classical solution but it feels completely artificial
to me.
- make a structure to hold all of the stock parameters, then just pass the structure instead of all of the parameters separately.
Since this was a quick hack, I was lazy and did the poor man's structure: I made a tuple with all of the stock parameters and just passed in the tuple (and then unpacked it in the classifying function). This is less aesthetically pleasing than a structure, but also less code, and it is the obvious next step when one's parameter list spirals out of control and most of it is the same from call to call.
(My eventual code had two arguments that varied from call to call and six that were the same, packed into a tuple. I'm sure that this is a code smell, but it was a quick hack.)
2010-08-05
Doing unsigned 32-bit integer math in Python
Modern versions of Python helpfully have infinite-precision integers. This is often useful (and they are intelligent), but there are periodic situations where I really do want to operate with fixed-size integers, rollover and all. My most recent need was computing a hash that was specified in terms of unsigned 32-bit operations (well, it was specified in C code that was doing such operations).
(It's common for hashes, checksums, and so on to be specified with this sort of arithmetic. If you need to duplicate the hash or checksum in pure Python, you're going to be faced with this.)
Embarrassingly, every time I run into this I go through the same exercise of working out how to do it and digging up my old code and convincing myself that it works. So I'm going to write this down once and hope that it sticks (and if not, I can look it up here later).
In pure Python for unsigned 32-bit arithmetic, it suffices to mask numbers with 0xffffffffL after every potentially overflowing operation. In my recent case, I wound up defining some convenience functions:
M32 = 0xffffffffL
def m32(n):
return n & M32
def madd(a, b):
return m32(a+b)
def msub(a, b):
return m32(a-b)
def mls(a, b):
return m32(a<<b)
(There is no need for an equivalent function for >>, as it can't
overflow.)
I don't know if there's an equivalent version for signed 32-bit integer math, rollover and all; so far I haven't needed one.
If you have NumPy available, the simpler version is just:
from numpy import uint32 from numpy import int32
NumPy also provides signed and unsigned 8 bit, 16 bit, and 64 bit integers under the obvious names.
Note that you don't want to mix these types with regular Python integers
if you want things to come out just right; you need to make sure that
all numbers involved in your code are turned into the appropriate NumPy
type. But if you do a bunch of computation with only a few original
numbers, this may well be the easiest, most convenient approach. If
you're copying code from another language, it will also likely keep your
Python code looking as much like the original as possible instead of
littering it with madd() and msub() calls.