2009-08-26
A frame object's f_locals isn't always the locals
Python frame objects have a tempting member called f_locals, which
is described as the 'local namespace seen by this frame' (to quote the
inspect module). This is
slightly misleading, because it is not always what Python programmers
normally think of as 'locals'.
Specifically, if the frame comes from code that is running at the
module level, f_locals is the 'local namespace' of that code,
that is, it is the module's namespace. In other words, it's the same
as f_globals, and in fact both of them are live references to the
module's name dictionary.
This is a gotcha because it means that f_locals has significantly
different behavior between module level code and function level code.
For module level code, you can modify f_locals and it actually
works; for function level code, modifying f_locals doesn't do
anything.
(Perhaps it would be better if f_locals was always read only.
There's plausible ways of making this inexpensive for module level
frames.)
You might now wonder how you tell if a frame belongs to module level
code or function level code. One answer is that you can look at
f_code.co_name, which will be "<module>" for module level code.
You can also see if f_locals and f_globals are the same thing.
(I suspect that you can play extensive games with eval() that will
fool some or all of these checks. So don't do that.)
On a side note, the other odd looking frame object member is
f_builtins. Under most circumstances, this is the same as
__builtins__.__dict__.
(My attention was drawn to this confusing situation by Multi-Line Lambdas in Python Using the With Statement by Bill Mill.)
2009-08-22
A gotcha with Python's new signal.siginterrupt()
Python 2.6 added a siginterrupt() function to the signal module, so that you can deal
with the EINTR problem. Unfortunately it doesn't
necessarily do what you want, because of CPython signal handler
semantics.
(Ob-attribution: signal.siginterrupt() was brought to my attention by
a commentator on the EINTR entry.)
What you probably want when you combine siginterrupt() and a signal
handler, and what a lot of people probably think that they are getting,
is that when your program is sitting in a system call and gets a signal,
your signal handler function gets called and does its thing while the
system call just keeps going, none the wiser. (This is what happens in
C, more or less.)
What you get instead is that processing the signal is deferred until the system call completes. Your program is sitting in a system call, gets a signal, and (as far as your Python code is concerned) absolutely nothing happens until the system call completes, whenever that is. Only then does your signal handler function wake up and do its thing.
This happens because CPython only runs Python signal handler functions
when control returns to the bytecode interpreter. System calls failing
with EINTR is the brute force mechanism that normally causes this to
happen relatively soon after your program gets a signal; the program
gets a signal, the system call fails with EINTR, the C code making the
system call notices this and raises an exception, which returns control
to the bytecode interpreter. When system calls don't get EINTR any
more, the whole chain of events stops (well, never gets started) and
there you are, waiting for the system call to finish normally.
One might hope that Python's siginterrupt() does something clever
to get around this, but it doesn't; it is strictly a wrapper for the
C library siginterrupt(3) function. (It probably has to be, since
I think that doing otherwise would require C module API changes.)
There are probably situations and signal handlers where this is
acceptable, or at least the least worst choice, but I do think
that it makes siginterrupt() a lot less useful than it first
appears.
(I also think that this should be explicitly mentioned in the
siginterrupt() description, although you can work it out from the
current documentation if you put all of the pieces together.)
2009-08-20
Python modules should not reinvent OSError
In light of yesterday's entry, here is one of my new rules for Python modules, especially extension modules written in C:
If you're going to raise an exception because a system call failed, don't make up a new exception for it. Raise either
IOErrororOSErrorthemselves.
Every deviation from this causes annoyances
for Python programmers. This is especially visible in the situation
with signals and EINTR , where I really do not want to
be rewriting the same code over and over again with different exception
classes (or worse, writing different code because you decided to hide
errno somewhere new in your exception).
A corollary to this is that if for some reason you absolutely have
to raise a new exception, you should make it duck typing compatible
with OSError (see here
for what that requires). But really, you don't have to, especially
because Python code can raise real OSError exceptions.
(Please do not get creative with the error message for OSError
instances that you create yourself in Python code. If it is not
exactly the strerror() of the errno, you are doing it wrong.)
Extension modules written in C have no excuse,
because the Python C API makes it very easy to do
the right thing (and reusing OSError saves you from having to create
your own C level exception object). Alas, the standard Python exception
modules are just overflowing with bad examples, where people make up a
new exception instead of reusing OSError or IOError.
(This is sort of the reverse corollary of not using IOError
for things that aren't system call errors.)
Sidebar: to wrap or not to wrap
I'm aware that this looks inconsistent with my views on not wrapping
exceptions. The difference to me is if you are
essentially wrapping a system call and directly exposing errno,
you should be using the standard way of doing that, which is an
EnvironmentError exception (or should be). If the system call's failure
is just an internal implementation detail, you should wrap the problem
up in your own exception to expose the high level issue.
(This is the difference between 'cannot resolve hostname, nameserver
not available' and 'sendto() failed'. A DNS module that returned the
latter would be technically correct but not useful.)
2009-08-19
Python, signal handlers, and EINTR
One of the interesting effects of setting a signal handler in your Python program is, well, let me quote the signal module:
- When a signal arrives during an I/O operation, it is possible that the I/O operation raises an exception after the signal handler returns. This is dependent on the underlying Unix system's semantics regarding interrupted system calls.
By 'an I/O operation' the manual means more than you might think; for
example, select.select() is affected by this. (This is where the
whole socket error boondoggle becomes very
irritating.)
In general, on Unixes that behave this way, signals that are handled
during 'slow' operations (especially IO-related ones) normally cause
the system call to fail with an EINTR error. Which system calls this
affects and under what circumstances is system dependent, but you
generally can count on it affecting at least socket operations, talking
to the user's terminal, and things like wait().
(And some system calls don't fail with EINTR but instead return
partial results if, for example, they have already transmitted part of
your write() on a socket.)
When CPython sees that a system call failed, it raises an exception. It
pretty much doesn't do anything special when EINTR is the 'failure'
reason; you still get a Python level exception, and it is up to you to
notice that this failure is not really a failure and you should retry
the operation, assuming that you can and want to.
(There are cases where you cannot; for example, I believe that
socket.sendall() can be hit with an EINTR despite having sent some
of the buffer, at which point you get an exception instead of a partial
result.)
It is my personal feeling that given Python's rich exception handling,
your low-level Python code should always retry operations when you get
an EINTR-based exception. If a signal handler actually wants to abort
or redirect the program, it can easily do this by raising an appropriate
exception; in the mean time, your low-level code is in the best position
to retry the aborted system call.
Sidebar: EINTR and SA_RESTART
On Unixes that have EINTR, you can usually tell the kernel that
you actually don't want your system calls interrupted just because a
particular signal handler got called by setting the SA_RESTART
flag on the signal handler. Unfortunately, Python does not expose
this, even on systems that support it.
Somewhat to my surprise, the GNU libc texinfo documentation has a decent
discussion of signals, EINTR, and SA_RESTART.