Wandering Thoughts archives

2009-08-26

A frame object's f_locals isn't always the locals

Python frame objects have a tempting member called f_locals, which is described as the 'local namespace seen by this frame' (to quote the inspect module). This is slightly misleading, because it is not always what Python programmers normally think of as 'locals'.

Specifically, if the frame comes from code that is running at the module level, f_locals is the 'local namespace' of that code, that is, it is the module's namespace. In other words, it's the same as f_globals, and in fact both of them are live references to the module's name dictionary.

This is a gotcha because it means that f_locals has significantly different behavior between module level code and function level code. For module level code, you can modify f_locals and it actually works; for function level code, modifying f_locals doesn't do anything.

(Perhaps it would be better if f_locals was always read only. There's plausible ways of making this inexpensive for module level frames.)

You might now wonder how you tell if a frame belongs to module level code or function level code. One answer is that you can look at f_code.co_name, which will be "<module>" for module level code. You can also see if f_locals and f_globals are the same thing.

(I suspect that you can play extensive games with eval() that will fool some or all of these checks. So don't do that.)

On a side note, the other odd looking frame object member is f_builtins. Under most circumstances, this is the same as __builtins__.__dict__.

(My attention was drawn to this confusing situation by Multi-Line Lambdas in Python Using the With Statement by Bill Mill.)

MisleadingFLocals written at 00:29:25; Add Comment

2009-08-22

A gotcha with Python's new signal.siginterrupt()

Python 2.6 added a siginterrupt() function to the signal module, so that you can deal with the EINTR problem. Unfortunately it doesn't necessarily do what you want, because of CPython signal handler semantics.

(Ob-attribution: signal.siginterrupt() was brought to my attention by a commentator on the EINTR entry.)

What you probably want when you combine siginterrupt() and a signal handler, and what a lot of people probably think that they are getting, is that when your program is sitting in a system call and gets a signal, your signal handler function gets called and does its thing while the system call just keeps going, none the wiser. (This is what happens in C, more or less.)

What you get instead is that processing the signal is deferred until the system call completes. Your program is sitting in a system call, gets a signal, and (as far as your Python code is concerned) absolutely nothing happens until the system call completes, whenever that is. Only then does your signal handler function wake up and do its thing.

This happens because CPython only runs Python signal handler functions when control returns to the bytecode interpreter. System calls failing with EINTR is the brute force mechanism that normally causes this to happen relatively soon after your program gets a signal; the program gets a signal, the system call fails with EINTR, the C code making the system call notices this and raises an exception, which returns control to the bytecode interpreter. When system calls don't get EINTR any more, the whole chain of events stops (well, never gets started) and there you are, waiting for the system call to finish normally.

One might hope that Python's siginterrupt() does something clever to get around this, but it doesn't; it is strictly a wrapper for the C library siginterrupt(3) function. (It probably has to be, since I think that doing otherwise would require C module API changes.)

There are probably situations and signal handlers where this is acceptable, or at least the least worst choice, but I do think that it makes siginterrupt() a lot less useful than it first appears.

(I also think that this should be explicitly mentioned in the siginterrupt() description, although you can work it out from the current documentation if you put all of the pieces together.)

SiginterruptGotcha written at 00:29:03; Add Comment

2009-08-20

Python modules should not reinvent OSError

In light of yesterday's entry, here is one of my new rules for Python modules, especially extension modules written in C:

If you're going to raise an exception because a system call failed, don't make up a new exception for it. Raise either IOError or OSError themselves.

Every deviation from this causes annoyances for Python programmers. This is especially visible in the situation with signals and EINTR , where I really do not want to be rewriting the same code over and over again with different exception classes (or worse, writing different code because you decided to hide errno somewhere new in your exception).

A corollary to this is that if for some reason you absolutely have to raise a new exception, you should make it duck typing compatible with OSError (see here for what that requires). But really, you don't have to, especially because Python code can raise real OSError exceptions.

(Please do not get creative with the error message for OSError instances that you create yourself in Python code. If it is not exactly the strerror() of the errno, you are doing it wrong.)

Extension modules written in C have no excuse, because the Python C API makes it very easy to do the right thing (and reusing OSError saves you from having to create your own C level exception object). Alas, the standard Python exception modules are just overflowing with bad examples, where people make up a new exception instead of reusing OSError or IOError.

(This is sort of the reverse corollary of not using IOError for things that aren't system call errors.)

Sidebar: to wrap or not to wrap

I'm aware that this looks inconsistent with my views on not wrapping exceptions. The difference to me is if you are essentially wrapping a system call and directly exposing errno, you should be using the standard way of doing that, which is an EnvironmentError exception (or should be). If the system call's failure is just an internal implementation detail, you should wrap the problem up in your own exception to expose the high level issue.

(This is the difference between 'cannot resolve hostname, nameserver not available' and 'sendto() failed'. A DNS module that returned the latter would be technically correct but not useful.)

ErrnoException written at 00:04:01; Add Comment

2009-08-19

Python, signal handlers, and EINTR

One of the interesting effects of setting a signal handler in your Python program is, well, let me quote the signal module:

  • When a signal arrives during an I/O operation, it is possible that the I/O operation raises an exception after the signal handler returns. This is dependent on the underlying Unix system's semantics regarding interrupted system calls.

By 'an I/O operation' the manual means more than you might think; for example, select.select() is affected by this. (This is where the whole socket error boondoggle becomes very irritating.)

In general, on Unixes that behave this way, signals that are handled during 'slow' operations (especially IO-related ones) normally cause the system call to fail with an EINTR error. Which system calls this affects and under what circumstances is system dependent, but you generally can count on it affecting at least socket operations, talking to the user's terminal, and things like wait().

(And some system calls don't fail with EINTR but instead return partial results if, for example, they have already transmitted part of your write() on a socket.)

When CPython sees that a system call failed, it raises an exception. It pretty much doesn't do anything special when EINTR is the 'failure' reason; you still get a Python level exception, and it is up to you to notice that this failure is not really a failure and you should retry the operation, assuming that you can and want to.

(There are cases where you cannot; for example, I believe that socket.sendall() can be hit with an EINTR despite having sent some of the buffer, at which point you get an exception instead of a partial result.)

It is my personal feeling that given Python's rich exception handling, your low-level Python code should always retry operations when you get an EINTR-based exception. If a signal handler actually wants to abort or redirect the program, it can easily do this by raising an appropriate exception; in the mean time, your low-level code is in the best position to retry the aborted system call.

Sidebar: EINTR and SA_RESTART

On Unixes that have EINTR, you can usually tell the kernel that you actually don't want your system calls interrupted just because a particular signal handler got called by setting the SA_RESTART flag on the signal handler. Unfortunately, Python does not expose this, even on systems that support it.

Somewhat to my surprise, the GNU libc texinfo documentation has a decent discussion of signals, EINTR, and SA_RESTART.

PythonEINTR written at 00:47:28; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.