Our workaround for Ubuntu 16.04 and 18.04 failing to reliably reboot some of our servers

September 25, 2019

A few years ago I wrote about how and why systemd on Ubuntu 16.04 couldn't reliably reboot some of our servers. At the time I finished off the entry by suggesting that we'd live with the intermittent failures that caused some of our systems to hang during reboot attempts, forcing us to go power cycle them. Shortly afterward, we changed our minds and decided to work around the situation by resorting to a bigger hammer. These days we use our bigger hammer on both Ubuntu 16.04 and Ubuntu 18.04; the latter may have improved some aspects of the shutdown situation, but our experience is that it hasn't fixed all of them.

The fundamental problem is that systemd can leave descendant processes running even when it has nominally terminated a systemd service, such as Apache, cron, or Exim. These lingering processes are not killed (or attempted to be killed) until very late and can cause a variety of problems during NFS unmounts, turning off swap, or various other portions of system shutdown. To deal with this, we use the big hammer of doing it ourselves; during shutdown, we run a script to kill lingering processes from various service units.

The script has a list of systemd services. For each service, it first looks in the systemd cgroup hierarchy to see if there are still processes associated with the service, by counting how many lines there are in /sys/fs/cgroup/systemd/system.slice/<what>.service/tasks. If there are processes still associated with the service, it kills them with SIGTERM and then SIGKILL (if necessary), using systemd itself to do the work with:

systemctl --kill-who=all --signal=SIG... kill <what>.service

(The actual implementation is slightly more complicated.)

The script has a bunch of logging to report on whether it had to do anything, what it did, and what the process tree looked like before and after it did various killing (as reported through systemd-cgls, because that will show us what systemd units the stray processes are under).

All of this is driven by a systemd .service unit with the following relevant bits:

Before=cron.service apache2.service exim4.service atd.service slurmd.service


We set After so that our stop action is run before NFS unmounting starts, and Before so that the stop action happens after those listed services are shut down. Not all of those services exist and are enabled on all machines, but listing a Before service that isn't enabled is harmless. The Before list is basically 'what has caused us problems'; we add things to it as we run into problem services.

(Slurmd is a recent addition, for example.)

Right now the list of 'before' services is duplicated between the script and the systemd unit. It feels tempting to try to eliminate that, but on the other hand I'm not sure I want to be introspecting systemd too much during shutdown. We could also try to be more general by sniffing around the cgroup hierarchy to find stray processes from any unit we don't whitelist (or at least any unit that's theoretically been shut down). However, that might not be very useful on modern systems, where 'KillMode=control-group' is the default.

The good news is that the script's logging suggests that it usually doesn't need to do anything during system shutdown on our 18.04 machines. But usually isn't always, which is what prompted the addition of slurmd.service.

Sidebar: A potential alternate approach

Basically this is making these units behave as if they were set to 'KillMode=control-group' during shutdown. You can change systemd unit properties on the fly and only for the current system boot (with 'systemctl --runtime set-property', which we use for our per-user CPU and memory limits), so perhaps it would work to switch to this KillMode on the relevant service units early in the shutdown process.

This option didn't even occur to me until I wrote this entry, and in general it seems more uncertain and chancy than just killing things (even if we're killing things indirectly through systemd). But it'd give you a much smaller and simpler script.

Written on 25 September 2019.
« How we implement per-user CPU and memory resource limits on Ubuntu
It's always convenient when malware is clear about its nature (7z edition) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Sep 25 00:44:54 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.