Wandering Thoughts archives

2017-09-24

Reading code and seeing what you're biased to see, illustrated

Recently I was reading some C code in systemd, one of the Linux init systems. This code is run in late-stage system shutdown and is responsible for terminating any remaining processes. A simplified version of the code looks like this:

void broadcast_signal(int sig, [...]) {
   [...]
   kill(-1, SIGSTOP);

   killall(sig, pids, send_sighup);

   kill(-1, SIGCONT);
   [...]
}

At this point it's important to note that the killall() function manually scans through all remaining processes. Also, this code is called to send either SIGTERM (plus SIGHUP) or SIGKILL to all or almost all processes.

The use of SIGSTOP and SIGCONT here are a bit unusual, since you don't need to SIGSTOP processes before you kill them (or send them signals in general). When I read this code, what I saw in their use was an ingenious way of avoiding any 'thundering herd' problems when processes started being signalled and dying, so I wrote it up in yesterday's entry. I saw this, I think, partly because I've had experience with thundering herd wakeups in response to processes dying and partly because in our situation, the remaining processes are stalled.

Then in comments on that entry, Davin noted that SIGSTOPing everything first did also did something else:

So, I would think it's more likely that the STOP/CONT pair are designed to create a stable process tree which can then be walked to build up a list of processes which actually need to be killed. By STOPping all other processes you prevent them from forking or worse, dieing and the process ID being re-used.

If you're manually scanning the process list in order to kill almost everything there, you definitely don't want to miss some processes because they appeared during your scan. Freezing all of the remaining processes so they can't do inconvenient things like fork() thus makes a lot of sense. In fact, it's quite possible that this is the actual reason for the SIGSTOP and SIGCONT code, and that the systemd people consider avoiding any thundering herd problems to be just a side bonus.

When I read the code, I completely missed this use. I knew all of the pieces necessary to see it, but it just didn't occur to me. It took Davin's comment to shift my viewpoint, and I find that sort of fascinating; it's one thing to know intellectually that you can have a too-narrow viewpoint and miss things when reading code, but another thing to experience it.

(I've had the experience where I read code incorrectly, but in this case I was reading the code correctly but missed some of the consequences and their relevance.)

programming/CodeReadingNarrowness written at 00:53:22; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.