2010-04-29
How you access an object can be important in Python
One of the less than obvious things about Python is that it can matter
a lot just how you access an object, and not just for performance
reasons. Accessing the same object through two different names can have
drastically different results under some circumstances, as yesterday's
entry about frame.f_locals shows.
(This is especially the case when resolving one of the names involves a C module (instead of just a Python one). C modules generally need 'getter' functions to translate their C-level data structures into the Python representation of them, so there's a greater temptation to make the getter functions do something clever since they already exist.)
The fundamental thing happening is that when you look up attributes on
an object, you give that object an opportunity to hijack your lookup and
do something clever (or at least something that it thinks is clever).
This happens every time you perform an attribute lookup, because Python
does not cache them (and cannot in the presence of any of the various
attribute lookup interception functions). Thus, every time you access
thing.attribute you can get a different result, or you can get the
same result but something is done to it behind your back (as was the
case with frame.f_locals).
(I checked and the frame.f_locals getter does always return the same
dictionary object, it just mangles it every time.)
Obviously, this can lead to very counterintuitive or outright nonsensical results, ones that generally take reading the source code of the object involved to understand. (As handily illustrated last entry.)
Note that this is different from returning objects that are themselves
magic. Again, frame.f_locals is an excellent example; what it
returns is a standard dictionary, so everything you do with the actual
dictionary is perfectly normal. It is just that every time you do an
attribute lookup on frame.f_locals to get that dictionary object,
something special happens.
As far as I know, the rules on what can be intercepted this way are:
- local and global name lookups can't be intercepted at all.
- module name lookups (ie, accessing '
module.thing') for Python modules can be intercepted if you try hard but this is vanishingly rare. - object and class attribute lookups can easily be intercepted.
- all lookups involving C-level objects and modules can easily be intercepted and often are, because there generally has to be some C-to-Python translation done anyways.
(Technically you might be able to do something excessively clever with lookups in your own module's global namespace, but if so hopefully you know about it. Because of how they are implemented, local variable lookups are truly un-interceptable.)
My intuition is that C modules are more likely to do special magic on attribute access than Python modules, because in C it is easier to mutate a standard Python object just before you return it than it is to return an entirely custom object with custom behavior. In Python code it's more even, so you're more likely to see such magic done through objects with customized behavior.
2010-04-28
Altering a Python function's local variables with a trace function
Some time ago I wrote that there was no way to change a function's local variables from outside it (well, specifically their name bindings). As it turns out, I'm wrong; there is one way to do it by going in the back door, although it's not a useful way.
There is specific code in CPython that allows a trace function to completely
alter a function's local variables and even function arguments; this (C)
code specifically reloads the internal interpreter version of local
variables from the frame.f_locals dictionary
when your trace function returns into the CPython interpreter. This
has to be done from a trace function, and it has to be done while the
function you want to modify is running; you cannot reach up into the
function that called your function and modify its variables.
(You may be able to give it a local tracing function by getting
its frame object and then assigning a suitable function to
frame.f_trace, but I haven't tested this.)
What you can do to local variables and function arguments is relatively unrestricted. You can change values, you can unbind variables (so that access to them will get UnboundLocalError), and you can bind variables that are currently unbound. However, as expected, you cannot add new local variables; attempts to do so are ignored.
Unfortunately, there is a further complication: you have to access
frame.f_locals in a very special way in order to have your
modifications work. The problem is that every time you access the
f_locals element of a frame object, some C code gets called; this
C code re-populates the dictionary from the current internal locals
and then returns it to you. This means that if you repeatedly access
f_locals directly, you discard your previous changes on each access
(as the real version of the locals is only updated from the dictionary
when your trace function returns).
For example, suppose that you write:
def localtrace(evt, frame, arg): frame.f_locals['a'] = 10 frame.f_locals['b'] = 20 return None
If you arrange to hook this up, you will discover that only your change
to b has taken; your change to a has disappeared. (It gets worse
if you then desperately attempt to debug this by inserting a 'print
frame.f_locals' statement in your local trace function; all your
changes will disappear.)
The way to get around this is to dereference f_locals exactly
once, by immediately binding your own local name for it:
def localtrace(evt, frame, arg): d = frame.f_locals d['a'] = 10 d['b'] = 20 return None
This will work, changing both a and b as you expect.
(I suspect that at least some Python debuggers are currently falling victim to this dark corner.)
As a trivia note, the same C code that is used to update the real
function locals for tracing functions is also invoked if you do 'from
module import ...' in a function, since the import has the same need to
update the function's local variables.
As an extra special trivia note, profile functions can also do this since they (currently) go through the same low-level C code as trace functions. But really, you don't want to go there.
Sidebar: How to actually turn off local tracing
According to the fine documentation, returning None from a local
tracing function will turn off tracing for the remainder of the
function. According to my actual testing, this is not the case (and
reading trace_trampoline in Python/sysmodule.c confirms it); while
returning a new local trace function works, returning None is just
ignored.
Until this bug is fixed, you will need to 'turn off' tracing by
returning a do-nothing trace function from your real local tracing
function. A suitable one is 'lambda x, y, z: None'.
My firm impression from both of these issues is that tracing functions and their access to function local variables are a very, very dark corner of the CPython interpreter, and you meddle in it at your peril. It seems clear that not very many people have ever tried to do anything with it, and there may be other problems too.
2010-04-20
Standard format Unix errors in Python programs
Here's some things on printing standard format Unix errors in Python programs specifically, based on things that I've both seen and done myself.
First, I tend to wind up putting standard warn() and die()
functions into my programs, but they only handle very low level details
(putting the program name in and flushing things as necessary). Beyond that:
- you should always catch
EnvironmentErrorinstead of anything more specific. Functions are inconsistent about whether they raiseIOErrororOSError, so you shouldn't try to guess, just deal with the general case. - don't try to do anything more complex with the EnvironmentError
exception object than turning it into a string (with
str()or just by printing it as one). This will print the errno number as well as the text error message, but I don't consider this a bad thing. - for things that cause the program to stop, if you handle each potentially
failing statement with its own
try/exceptblock you will rapidly go mad and repeat yourself a lot. Try to wrap as much as possible in a single generic block. The ultimate version of this is to have a singletry/exceptin your main function and have the rest of your program ignore the issue.(This is not as applicable to warning messages, because you pretty much have to handle them fairly locally.)
However, the more you aggregate exception handling, the more you need some way of keeping track of what the program was doing at the point where it blew up. I tend to solve this with what I call phase tracking.
In general, you have a design decision to make about how to handle fatal
errors. One way to handle them is to call die() immediately where you
hit the error; another way is to let the exception bubble up (either in
its original form or re-wrapped into an internal exception class). The
drawback of the former is what I covered in ImportableMain; the more
the low-level bits of your program wind up calling sys.exit(), the
less you can conveniently test them by themselves because a failure
immediately shuts down everything. The drawback of the latter is that
you are forced into some sort of phase tracking system.
Sidebar: my versions of warn() and die()
Generally my versions of these functions look like this:
import sys
def warn(msg):
sys.stdout.flush()
sys.stderr.write("%s: %s\n" % \
(sys.argv[0], msg))
sys.stderr.flush()
def die(msg):
warn(msg)
sys.exit(1)
(See here for why I put in the
calls to .flush().)
More complicated games are possible with the name of the program (so that it gets shortened by taking out the full path), but usually I don't bother.