2009-05-23
The drawback of using a language with a good interface to the OS
Python generally has a great interface to Unix; through various modules
(not just os, but also things like socket, select, and so on) it
exposes almost all of the generally useful APIs and does so in a way
that is simultaneously Unixy and Pythonic. The best way of putting it
is that everything just works, and just works the way you'd expect it
to if you know the C-level API and Python.
(This is one reason that I have never been terribly energized about the ctypes module; the interface it gives you to whatever C library function you want is nowhere near as nice as Python's normal, native interfaces. It's hard to feel enthused about slogging through mud when I'm used to walking over grass, even if slogging through mud beats the heck out of not being able to get there at all.)
But I've discovered that all of this excellence has a little drawback for me. Python's interfaces are so good and so obvious that they have tempted me into not bothering to thoroughly read the documentation; I just look them up to make sure that they're there, maybe poke them a bit to see how they fit in with the rest of Python, and then assume that I'm done. I don't go on to the crucial next step: checking how they're different from the native Unix API.
Usually I can get away with this, because the Python interface is just the same as the C level one. But sometimes there are important differences which I wind up missing, with unfortunate consequences. While I'm not sure that I'd have caught the differences if I paid better attention to the signal module's documentation, I'd at least feel better about the situation if I'd read the documentation carefully at the start and still missed it.
(While I don't think that this is a documentation bug as such, I do think that the Python documentation would be more usable if it highlighted any significant differences from the C-level Unix semantics. The signal module does do some highlighting, but it's not complete.)
2009-05-22
How CPython handles (and delays) Unix signals
CPython handles Unix signals somewhat oddly, or at least not in what you might think of as a standard Unix way. I've covered part of this before in SignalProblem and made side mentions in other entries, but I want to write all of this down in one place (even if only as an index).
First, none of this applies if you set a signal's handler to SIG_IGN
or SIG_DFL. Those get standard Unix semantics, because CPython just
sets the signal to either of those at the C level.
Otherwise, the important thing to know is that Python signal handler functions are not real signal handlers; they are just functions that the bytecode interpreter calls (at some point) in response to CPython receiving a signal. Real signal handlers are called immediately when the process gets a signal and they generally won't get re-entered if the process gets the same signal again while they're running. CPython 'signal handlers' have no reentrancy protection and their execution can be delayed (sometimes for quite a while) from when the signal was sent.
(Technically a Python signal handler can be any callable object. I'm (mis)using 'function' for shortness.)
Mechanically, the only thing CPython does in the real signal handler is note down some information about the signal and set a flag for the bytecode interpreter (the actual code has more levels of indirection than this description). The interpreter checks the flag after most bytecode instructions (there are a few special 'fast' instructions that go directly to the next instruction, bypassing various checks); if the flag is set and this is the main thread, the interpreter then invokes the Python-level signal handler (including the 'raise an exception' SIGINT handler).
So, at least two things will delay your signal handlers, possibly for quite a while:
- your code is in a C-level module waiting for a result
(in a way that won't get interrupted by a signal). The one that I
frequently run into is
socket.gethostbyname(), as it waits for DNS timeouts (okay, technically it's waiting for DNS answers, but it's not going to get them). - your main thread is waiting to be poked by another thread. Judging from the code, the thread locking and synchronization primitives aren't aborted by signals, although some of the higher level operations are implemented in Python and so are semi-interruptible.
(Until I started looking at this, I had not really noticed the thread case.)
While making Python signal handlers not run from inside the actual signal handlers sounds limiting, it is basically the only choice that CPython has; there is very little that you can safely do in a signal handler besides carefully set some flag variables, because not even the C library itself is arbitrarily reentrant.
Pretty much all of this is mentioned, although not in large flaming letters, in the start of the signal module documentation.
2009-05-21
Solving the Python SIGCHLD problem
In brief, the Python SIGCHLD problem is that while C-level SIGCHLD
signal handlers are protected against reentrancy, your Python level
SIGCHLD handler function is not, because it is not actually executed
as a C-level signal handler. This winds up causing heartburn and
potential explosions if you do anything sophisticated when child
processes exit. (See SIGCHLDVsPython for a longer discussion.)
In thinking about the problem, I've realized that signal.signal()
is itself a peculiar but useful atomic operation. This is important,
because in order to fix this problem in Python code we need to build a
'test and set' (or vice versa) primitive that is not thread related,
so that we can guard the signal handler function with it.
So the sketch of a solution to the problem is this: the first thing the
Python signal handler does is immediately set SIGCHLD to SIG_DFL
and examine the old handler value it gets back. If the old handler is
already SIG_DFL, the signal handler has been re-entered and it must
immediately return; otherwise it reaps children as normal, and at the
end resets the SIGCHLD handler to its old value.
(Usefully, Unix semantics insure that we will never miss dead children;
if you (re-)enable a SIGCHLD handler with pending dead children, the
kernel immediately sends you a SIGCHLD.)
Disclaimer: I believe that this should work, but beware, I haven't actually tested it.
(I came up with this idea some time ago, but didn't want to write it up until I'd actually implemented it and knew for sure that it worked. But I haven't gotten around to doing that and I probably won't any time soon, for various reasons, and so it's time to stop sitting on this.)