== A clever way of killing groups of processes While reading parts of the [[systemd source code https://github.com/systemd/systemd/]] that handle [[late stage shutdown ../linux/SystemdShutdownWatchdog]], I ran across an oddity in the code that's used to kill all remaining processes. A simplified version of the code looks like this: .pn prewrap on > void broadcast_signal(int sig, [...]) { > [...] > kill(-1, SIGSTOP); > > killall(sig, pids, send_sighup); > > kill(-1, SIGCONT); > [...] > } (I've removed error checking and some other things; you can see the original [[here https://github.com/systemd/systemd/blob/master/src/core/killall.c]].) This is called to send signals like _SIGTERM_ and _SIGKILL_ to everything. At first the use of _SIGSTOP_ and _SIGCONT_ puzzled me, and I wondered if there was some special behavior in Linux if you _SIGTERM_'d [[a _SIGSTOP_'d process SIGSTOPUsesAndCautions]]. Then the penny dropped; ~~by _SIGSTOP_ing processes first, we're avoiding any thundering herd problems when processes start dying~~. Even if you use _kill(-1, )_, the kernel doesn't necessarily guarantee that all processes will receive the signal at once before any of them are scheduled. So imagine you have a shell pipeline that's remained intact all the way into late-stage shutdown, and all of the processes involved in it are blocked: > proc1 | proc2 | proc3 | proc4 | proc5 It's perfectly valid for the kernel to deliver a _SIGTERM_ to _proc1_, immediately kill the process because it has no signal handler, close _proc1_'s standard output pipe as part of process termination, and then wake up _proc2_ because now its standard input has hit end-of-file, even though either you or the kernel will very soon send _proc2_ its own _SIGTERM_ signal that will cause it to die in turn. This and similar cases, such as a parent waiting for children to exit, can easily lead to [[highly unproductive system thrashing ../sysadmin/KillOrderImportance]] as processes are woken up unnecessarily. And if a process has a _SIGTERM_ signal handler, the kernel will of course schedule it to wake up and may start it running immediately, especially on a multi-core system. Sending everyone a _SIGSTOP_ before the real signal completely avoids this. With all processes suspended, all of them will get your signal before any of them can wake up from other causes. If they're going to die from the signal, they'll die on the spot; they're not going to die (because you're starting with _SIGTERM_ or _SIGHUP_ and they block or handle it), they'll only get woken up at the end, after most of the dust has settled. It's a great solution to a subtle issue. (If you're sending _SIGKILL_ to everyone, most or all of them will never wake up; they'll all be terminated unless something terrible has gone wrong. This means this _SIGSTOP_ trick avoids ever having any of the processes run; you freeze them all and then they die quietly. This is exactly what you want to happen at the end of system shutdown.)