2009-05-21
Solving the Python SIGCHLD
problem
In brief, the Python SIGCHLD
problem is that while C-level SIGCHLD
signal handlers are protected against reentrancy, your Python level
SIGCHLD
handler function is not, because it is not actually executed
as a C-level signal handler. This winds up causing heartburn and
potential explosions if you do anything sophisticated when child
processes exit. (See SIGCHLDVsPython for a longer discussion.)
In thinking about the problem, I've realized that signal.signal()
is itself a peculiar but useful atomic operation. This is important,
because in order to fix this problem in Python code we need to build a
'test and set' (or vice versa) primitive that is not thread related,
so that we can guard the signal handler function with it.
So the sketch of a solution to the problem is this: the first thing the
Python signal handler does is immediately set SIGCHLD
to SIG_DFL
and examine the old handler value it gets back. If the old handler is
already SIG_DFL
, the signal handler has been re-entered and it must
immediately return; otherwise it reaps children as normal, and at the
end resets the SIGCHLD
handler to its old value.
(Usefully, Unix semantics insure that we will never miss dead children;
if you (re-)enable a SIGCHLD
handler with pending dead children, the
kernel immediately sends you a SIGCHLD
.)
Disclaimer: I believe that this should work, but beware, I haven't actually tested it.
(I came up with this idea some time ago, but didn't want to write it up until I'd actually implemented it and knew for sure that it worked. But I haven't gotten around to doing that and I probably won't any time soon, for various reasons, and so it's time to stop sitting on this.)