A clever way of killing groups of processes

September 23, 2017

While reading parts of the systemd source code that handle late stage shutdown, I ran across an oddity in the code that's used to kill all remaining processes. A simplified version of the code looks like this:

void broadcast_signal(int sig, [...]) {
   kill(-1, SIGSTOP);

   killall(sig, pids, send_sighup);

   kill(-1, SIGCONT);

(I've removed error checking and some other things; you can see the original here.)

This is called to send signals like SIGTERM and SIGKILL to everything. At first the use of SIGSTOP and SIGCONT puzzled me, and I wondered if there was some special behavior in Linux if you SIGTERM'd a SIGSTOP'd process. Then the penny dropped; by SIGSTOPing processes first, we're avoiding any thundering herd problems when processes start dying.

Even if you use kill(-1, <signal>), the kernel doesn't necessarily guarantee that all processes will receive the signal at once before any of them are scheduled. So imagine you have a shell pipeline that's remained intact all the way into late-stage shutdown, and all of the processes involved in it are blocked:

proc1 | proc2 | proc3 | proc4 | proc5

It's perfectly valid for the kernel to deliver a SIGTERM to proc1, immediately kill the process because it has no signal handler, close proc1's standard output pipe as part of process termination, and then wake up proc2 because now its standard input has hit end-of-file, even though either you or the kernel will very soon send proc2 its own SIGTERM signal that will cause it to die in turn. This and similar cases, such as a parent waiting for children to exit, can easily lead to highly unproductive system thrashing as processes are woken up unnecessarily. And if a process has a SIGTERM signal handler, the kernel will of course schedule it to wake up and may start it running immediately, especially on a multi-core system.

Sending everyone a SIGSTOP before the real signal completely avoids this. With all processes suspended, all of them will get your signal before any of them can wake up from other causes. If they're going to die from the signal, they'll die on the spot; they're not going to die (because you're starting with SIGTERM or SIGHUP and they block or handle it), they'll only get woken up at the end, after most of the dust has settled. It's a great solution to a subtle issue.

(If you're sending SIGKILL to everyone, most or all of them will never wake up; they'll all be terminated unless something terrible has gone wrong. This means this SIGSTOP trick avoids ever having any of the processes run; you freeze them all and then they die quietly. This is exactly what you want to happen at the end of system shutdown.)

Written on 23 September 2017.
« Using a watchdog timer in system shutdown with systemd (on Ubuntu 16.04)
Reading code and seeing what you're biased to see, illustrated »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Sep 23 02:42:54 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.