SIGCHLD versus Python: a problem of semantics
In the process of looking at my program's code again to write the last entry, I think I may have solved the mystery of how my impossible exception gets generated.
My program does a lot of forking and thus cleanups of now-dead children. The code that it generally dies on is:
def _delip(pid, ip): del ipmap[ip][pid] if len(ipmap[ip]) == 0: del ipmap[ip]
It takes a KeyError on the
len(ipmap[ip]) operation and goes down.
(Because of previous fun, the main
thread forks all the children and waits for them, so this kills the
Clearly there is some concurrency problem, but my problem with the
exception was that I've never seen where it could come from. The main
thread is the only thread that adds or removes things from the
dictionary, and the
SIGCHLD handler that reaps children is only active
when the thread is idling in
select() (partly to avoid just this
sort of concurrency issue).
To avoid various problems and just create sanity, Unix
handlers are not reentrant; even if more children die, you won't receive
SIGCHLD until you return from the signal handler. (This is
an interesting source of bugs if you bail out of the signal handler
without telling the kernel, and is one reason for the existence of
And in thinking about all of this I came to a horrible realization:
those are Unix semantics, not Python semantics. Python does not
run your Python-level
SIGCHLD handler from the actual C level signal
handler; it runs them from the regular bytecode interpreter. All the C
SIGCHLD handler does is set a flag telling the interpreter to
SIGCHLD handler at the next bytecode, where it gets treated
pretty much as an ordinary function call.
This would neatly explain my mysterious exceptions. When there are two
connections from an IP address and both of them die in short succession,
if we are extremely unlucky the
SIGCHLD for the second will be
_delip's first and second lines and delete the
ipmap[ip] dictionary entry out from underneath the first.
I personally believe that this is a bug in the CPython interpreter, but even if I can persuade the Python people of this, I still need to come up with a Python-level workaround for the mean time (ideally one that doesn't involve too much code reorganization).