== A clever way of killing groups of processes

While reading parts of the [[systemd source code
https://github.com/systemd/systemd/]] that handle [[late stage
shutdown ../linux/SystemdShutdownWatchdog]], I ran across an oddity
in the code that's used to kill all remaining processes. A simplified
version of the code looks like this:

.pn prewrap on

>  void broadcast_signal(int sig, [...]) {
>     [...]
>     kill(-1, SIGSTOP);
>
>     killall(sig, pids, send_sighup);
>
>     kill(-1, SIGCONT);
>     [...]
>  }

(I've removed error checking and some other things; you can see
the original [[here
https://github.com/systemd/systemd/blob/master/src/core/killall.c]].)

This is called to send signals like _SIGTERM_ and _SIGKILL_ to
everything. At first the use of _SIGSTOP_ and _SIGCONT_ puzzled me,
and I wondered if there was some special behavior in Linux if you
_SIGTERM_'d [[a _SIGSTOP_'d process SIGSTOPUsesAndCautions]]. Then
the penny dropped; ~~by _SIGSTOP_ing processes first, we're avoiding
any thundering herd problems when processes start dying~~.

Even if you use _kill(-1, <signal>)_, the kernel doesn't necessarily
guarantee that all processes will receive the signal at once before
any of them are scheduled. So imagine you have a shell pipeline
that's remained intact all the way into late-stage shutdown, and
all of the processes involved in it are blocked:

>  proc1 | proc2 | proc3 | proc4 | proc5

It's perfectly valid for the kernel to deliver a _SIGTERM_ to
_proc1_, immediately kill the process because it has no signal
handler, close _proc1_'s standard output pipe as part of process
termination, and then wake up _proc2_ because now its standard input
has hit end-of-file, even though either you or the kernel will very
soon send _proc2_ its own _SIGTERM_ signal that will cause it to
die in turn. This and similar cases, such as a parent waiting for
children to exit, can easily lead to [[highly unproductive system
thrashing ../sysadmin/KillOrderImportance]] as processes are woken
up unnecessarily. And if a process has a _SIGTERM_ signal handler,
the kernel will of course schedule it to wake up and may start it
running immediately, especially on a multi-core system.

Sending everyone a _SIGSTOP_ before the real signal completely
avoids this. With all processes suspended, all of them will get
your signal before any of them can wake up from other causes.  If
they're going to die from the signal, they'll die on the spot;
they're not going to die (because you're starting with _SIGTERM_
or _SIGHUP_ and they block or handle it), they'll only get woken
up at the end, after most of the dust has settled. It's a great
solution to a subtle issue.

(If you're sending _SIGKILL_ to everyone, most or all of them will
never wake up; they'll all be terminated unless something terrible
has gone wrong. This means this _SIGSTOP_ trick avoids ever having
any of the processes run; you freeze them all and then they die
quietly. This is exactly what you want to happen at the end of
system shutdown.)