2010-08-30
My avoidance of Python global variables
I spent part of today writing a quick one-off data conversion program. The core of it was a function that filtered items from a list through a number of things in order to sort them into the right category. Once the dust settled on all of the sorting needed, the function had quite a lot of stock arguments, things that didn't vary from call to call in my program. In fact, an unwieldy number of them.
There are at least three vaguely Pythonic options for how to deal with this (plus how I actually did), but what interests me in retrospect is the one answer that I didn't even think about. Namely, global variables.
There are all sorts of reasons to avoid global variables in general, but this was a one-off program and if I'm being honest, that's what all of those stock parameters really were. I was making them local variables in the calling function and then passing them in to the classifying function not so much because it was a good idea but because that's what I do in Python. I just don't use global variables very much even when they'd arguably make sense, and when I do use them I feel irritated.
As best I can tell, what does it is the pesky global
keyword. Having
to declare variables global
any time I want to rebind them adds just
enough extra friction to using global variables in practice that I would
rather not bother and instead pass lots of things around as parameters.
I generally resort to global variables only when passing the same
information as parameters would add arguments to too many layers of
function calls.
(This is the situation where you have four or five layers of function calls and some of the stuff down at the leaves wants to gather some expensive piece of information only once. The nominally logical thing to do is to call the 'gather information' function once at the start of your program and then pass the parameter all the way down to the leaves, but that means you have to pass the information object through all of the intermediate layers, where all it does is clutter up parameter lists. Really, you want to put it in a global variable, especially if you have several different clusters of these functions that want different chunks of information; passing the information they need down as parameters doesn't scale.)
Part of the friction is the annoyance of the extra line in any function
that will rebind the global variable. But another part is just having to
think about it at all, partly because I sort of consider global
to be
a wart (especially because I know what the bytecode is doing behind
the scenes).
(Global's not really a wart, but that's another entry.)
Sidebar: the three options that I am thinking of
The three Python options that immediately come to mind are:
- embed the classifying function into its caller as a closure, giving
it direct access to all of what used to be the stock parameters.
This feels like a hack to me, and I don't like the extra level
of indentation.
- make the classifying function a method on a class which otherwise
had all of the stock parameters as instance variables. It's
probably the classical solution but it feels completely artificial
to me.
- make a structure to hold all of the stock parameters, then just pass the structure instead of all of the parameters separately.
Since this was a quick hack, I was lazy and did the poor man's structure: I made a tuple with all of the stock parameters and just passed in the tuple (and then unpacked it in the classifying function). This is less aesthetically pleasing than a structure, but also less code, and it is the obvious next step when one's parameter list spirals out of control and most of it is the same from call to call.
(My eventual code had two arguments that varied from call to call and six that were the same, packed into a tuple. I'm sure that this is a code smell, but it was a quick hack.)