2017-09-24
Reading code and seeing what you're biased to see, illustrated
Recently I was reading some C code in systemd, one of the Linux init systems. This code is run in late-stage system shutdown and is responsible for terminating any remaining processes. A simplified version of the code looks like this:
void broadcast_signal(int sig, [...]) { [...] kill(-1, SIGSTOP); killall(sig, pids, send_sighup); kill(-1, SIGCONT); [...] }
At this point it's important to note that the killall()
function
manually scans through all remaining processes. Also, this code is
called to send either SIGTERM
(plus SIGHUP
) or SIGKILL
to all
or almost all processes.
The use of SIGSTOP
and SIGCONT
here are a bit unusual, since you
don't need to SIGSTOP
processes before you kill them (or send them
signals in general). When I read this code, what I saw in their use
was an ingenious way of avoiding any 'thundering herd' problems when
processes started being signalled and dying, so I wrote it up in
yesterday's entry. I saw this, I think,
partly because I've had experience with thundering herd wakeups in
response to processes dying and partly
because in our situation, the
remaining processes are stalled.
Then in comments on that entry, Davin
noted that SIGSTOP
ing everything first did also did something else:
So, I would think it's more likely that the STOP/CONT pair are designed to create a stable process tree which can then be walked to build up a list of processes which actually need to be killed. By STOPping all other processes you prevent them from forking or worse, dieing and the process ID being re-used.
If you're manually scanning the process list in order to kill almost
everything there, you definitely don't want to miss some processes
because they appeared during your scan. Freezing all of the remaining
processes so they can't do inconvenient things like fork()
thus
makes a lot of sense. In fact, it's quite possible that this is the
actual reason for the SIGSTOP
and SIGCONT
code, and that the
systemd people consider avoiding any thundering herd problems to
be just a side bonus.
When I read the code, I completely missed this use. I knew all of the pieces necessary to see it, but it just didn't occur to me. It took Davin's comment to shift my viewpoint, and I find that sort of fascinating; it's one thing to know intellectually that you can have a too-narrow viewpoint and miss things when reading code, but another thing to experience it.
(I've had the experience where I read code incorrectly, but in this case I was reading the code correctly but missed some of the consequences and their relevance.)