Wandering Thoughts archives

2010-04-29

How you access an object can be important in Python

One of the less than obvious things about Python is that it can matter a lot just how you access an object, and not just for performance reasons. Accessing the same object through two different names can have drastically different results under some circumstances, as yesterday's entry about frame.f_locals shows.

(This is especially the case when resolving one of the names involves a C module (instead of just a Python one). C modules generally need 'getter' functions to translate their C-level data structures into the Python representation of them, so there's a greater temptation to make the getter functions do something clever since they already exist.)

The fundamental thing happening is that when you look up attributes on an object, you give that object an opportunity to hijack your lookup and do something clever (or at least something that it thinks is clever). This happens every time you perform an attribute lookup, because Python does not cache them (and cannot in the presence of any of the various attribute lookup interception functions). Thus, every time you access thing.attribute you can get a different result, or you can get the same result but something is done to it behind your back (as was the case with frame.f_locals).

(I checked and the frame.f_locals getter does always return the same dictionary object, it just mangles it every time.)

Obviously, this can lead to very counterintuitive or outright nonsensical results, ones that generally take reading the source code of the object involved to understand. (As handily illustrated last entry.)

Note that this is different from returning objects that are themselves magic. Again, frame.f_locals is an excellent example; what it returns is a standard dictionary, so everything you do with the actual dictionary is perfectly normal. It is just that every time you do an attribute lookup on frame.f_locals to get that dictionary object, something special happens.

As far as I know, the rules on what can be intercepted this way are:

  • local and global name lookups can't be intercepted at all.
  • module name lookups (ie, accessing 'module.thing') for Python modules can be intercepted if you try hard but this is vanishingly rare.
  • object and class attribute lookups can easily be intercepted.
  • all lookups involving C-level objects and modules can easily be intercepted and often are, because there generally has to be some C-to-Python translation done anyways.

(Technically you might be able to do something excessively clever with lookups in your own module's global namespace, but if so hopefully you know about it. Because of how they are implemented, local variable lookups are truly un-interceptable.)

My intuition is that C modules are more likely to do special magic on attribute access than Python modules, because in C it is easier to mutate a standard Python object just before you return it than it is to return an entirely custom object with custom behavior. In Python code it's more even, so you're more likely to see such magic done through objects with customized behavior.

AccessDetailsImportance written at 02:08:10; Add Comment

2010-04-28

Altering a Python function's local variables with a trace function

Some time ago I wrote that there was no way to change a function's local variables from outside it (well, specifically their name bindings). As it turns out, I'm wrong; there is one way to do it by going in the back door, although it's not a useful way.

There is specific code in CPython that allows a trace function to completely alter a function's local variables and even function arguments; this (C) code specifically reloads the internal interpreter version of local variables from the frame.f_locals dictionary when your trace function returns into the CPython interpreter. This has to be done from a trace function, and it has to be done while the function you want to modify is running; you cannot reach up into the function that called your function and modify its variables.

(You may be able to give it a local tracing function by getting its frame object and then assigning a suitable function to frame.f_trace, but I haven't tested this.)

What you can do to local variables and function arguments is relatively unrestricted. You can change values, you can unbind variables (so that access to them will get UnboundLocalError), and you can bind variables that are currently unbound. However, as expected, you cannot add new local variables; attempts to do so are ignored.

Unfortunately, there is a further complication: you have to access frame.f_locals in a very special way in order to have your modifications work. The problem is that every time you access the f_locals element of a frame object, some C code gets called; this C code re-populates the dictionary from the current internal locals and then returns it to you. This means that if you repeatedly access f_locals directly, you discard your previous changes on each access (as the real version of the locals is only updated from the dictionary when your trace function returns).

For example, suppose that you write:

def localtrace(evt, frame, arg):
  frame.f_locals['a'] = 10
  frame.f_locals['b'] = 20
  return None

If you arrange to hook this up, you will discover that only your change to b has taken; your change to a has disappeared. (It gets worse if you then desperately attempt to debug this by inserting a 'print frame.f_locals' statement in your local trace function; all your changes will disappear.)

The way to get around this is to dereference f_locals exactly once, by immediately binding your own local name for it:

def localtrace(evt, frame, arg):
  d = frame.f_locals
  d['a'] = 10
  d['b'] = 20
  return None

This will work, changing both a and b as you expect.

(I suspect that at least some Python debuggers are currently falling victim to this dark corner.)

As a trivia note, the same C code that is used to update the real function locals for tracing functions is also invoked if you do 'from module import ...' in a function, since the import has the same need to update the function's local variables.

As an extra special trivia note, profile functions can also do this since they (currently) go through the same low-level C code as trace functions. But really, you don't want to go there.

Sidebar: How to actually turn off local tracing

According to the fine documentation, returning None from a local tracing function will turn off tracing for the remainder of the function. According to my actual testing, this is not the case (and reading trace_trampoline in Python/sysmodule.c confirms it); while returning a new local trace function works, returning None is just ignored.

Until this bug is fixed, you will need to 'turn off' tracing by returning a do-nothing trace function from your real local tracing function. A suitable one is 'lambda x, y, z: None'.

My firm impression from both of these issues is that tracing functions and their access to function local variables are a very, very dark corner of the CPython interpreter, and you meddle in it at your peril. It seems clear that not very many people have ever tried to do anything with it, and there may be other problems too.

FLocalsAndTraceFunctions written at 01:36:19; Add Comment

2010-04-20

Standard format Unix errors in Python programs

Here's some things on printing standard format Unix errors in Python programs specifically, based on things that I've both seen and done myself.

First, I tend to wind up putting standard warn() and die() functions into my programs, but they only handle very low level details (putting the program name in and flushing things as necessary). Beyond that:

  • you should always catch EnvironmentError instead of anything more specific. Functions are inconsistent about whether they raise IOError or OSError, so you shouldn't try to guess, just deal with the general case.
  • don't try to do anything more complex with the EnvironmentError exception object than turning it into a string (with str() or just by printing it as one). This will print the errno number as well as the text error message, but I don't consider this a bad thing.

  • for things that cause the program to stop, if you handle each potentially failing statement with its own try/except block you will rapidly go mad and repeat yourself a lot. Try to wrap as much as possible in a single generic block. The ultimate version of this is to have a single try/except in your main function and have the rest of your program ignore the issue.

    (This is not as applicable to warning messages, because you pretty much have to handle them fairly locally.)

    However, the more you aggregate exception handling, the more you need some way of keeping track of what the program was doing at the point where it blew up. I tend to solve this with what I call phase tracking.

In general, you have a design decision to make about how to handle fatal errors. One way to handle them is to call die() immediately where you hit the error; another way is to let the exception bubble up (either in its original form or re-wrapped into an internal exception class). The drawback of the former is what I covered in ImportableMain; the more the low-level bits of your program wind up calling sys.exit(), the less you can conveniently test them by themselves because a failure immediately shuts down everything. The drawback of the latter is that you are forced into some sort of phase tracking system.

Sidebar: my versions of warn() and die()

Generally my versions of these functions look like this:

import sys
def warn(msg):
  sys.stdout.flush()
  sys.stderr.write("%s: %s\n" % \
                   (sys.argv[0], msg))
  sys.stderr.flush()

def die(msg):
  warn(msg)
  sys.exit(1)

(See here for why I put in the calls to .flush().)

More complicated games are possible with the name of the program (so that it gets shortened by taking out the full path), but usually I don't bother.

PythonStandardErrors written at 01:16:23; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.