Using a watchdog timer in system shutdown with systemd (on Ubuntu 16.04)
In Systemd, NFS mounts, and shutting down your system, I covered how Mike Kazantsev pointed me at the
ShutdownWatchdogSec
setting in system.conf
as a way of dealing with our reboot hang issues. I also alluded to
some issues with it. We've now tested and deployed a setup using
this, so I want to walk through how it works and what its limitations
are. As part of that I need to talk about how systemd actually shuts
down your system.
Under systemd, system shutdown happens in two stages. The first
stage is systemd stopping all of the system units that it can, in
whatever way or ways they're configured to stop. Some units may
fail to stop here and some processes may not be killed by their
unit's 'stop' action(s), for example processes run by cron
. This stage is the visible part of system
shutdown, the bit that causes systemd to print out all of its console
messages. It ends when systemd reaches shutdown.target
, which is
when you get console messages like:
[...] [ OK ] Stopped Remount Root and Kernel File Systems. [ OK ] Stopped Create Static Device Nodes in /dev. [ OK ] Reached target Shutdown.
(There are apparently a few more magic systemd targets and services that get invoked here without producing any console messages.)
The second stage starts when systemd transfers control (and being
PID 1) to the special systemd-shutdown
program in order to do the final cleanup and shutdown of the system
(the manual page describes why it exists and you can read the actual
core code here).
Simplified, systemd-shutdown
SIGTERM
s and then SIGKILL
s all
remaining processes and then enters a loop where it attempts to
unmount any remaining filesystems, deactivate any remaining swap
devices, and shut down remaining loop and DM devices. If all such
things are gone or systemd-shutdown
makes no progress at all, it
goes on to do the actual reboot. Unless you turn on systemd debugging
(and direct it to the console), systemd-shutdown
is completely
silent about all of this; it prints nothing when it starts and
nothing as it runs. Normally this doesn't matter because it finishes
immediately and without problems.
Based on the manpage, you might think that ShutdownWatchdogSec
limits the total amount of time a shutdown can take and thus covers
both of these stages. This is not the case; the only thing that
ShutdownWatchdogSec
does is put a watchdog timer on systemd-shutdown
's
end-of-things work in the second stage. Well, sort of. If you read
the manpage, you'd probably think that the time you configure here
is the time limit on the second stage as a whole, but actually it's
only the time limit on each of those 'try to clean up remaining
things' loops. systemd-shutdown
resets the watchdog every time
it starts a trip through the loop, so as long as it thinks it's
making some progress, your shutdown can take much longer than you
expect in sufficiently perverse situations. Or rather I should say
your reboot. As the manual page specifically notes, the watchdog
shutdown timer only applies to reboots, not to powering the system
off.
(One consequence of what ShutdownWatchdogSec
does and doesn't
cover is that for most systems it's safe to set it to a very low
timeout. If you get to the systemd-shutdown
stage with any processes
left, so many things have already been shut down that those processes
are probably not going to manage an orderly shutdown in any case.
We currently use 30 seconds and that's probably far too generous.)
To use ShutdownWatchdogSec
, you need a kernel watchdog timer; you
can tell if you have one by looking for /dev/watchdog
and
/dev/watchdogN
devices. Kernel watchdog timers are created by a
variety of modules that support a variety of hardware watchdogs,
such as iTCO_wdt
for the Intel TCO WatchDog
that you probably have on your Intel-based server hardware. For our
purposes here, the simplest and easiest to use kernel watchdog
module is softdog
, a software watchdog implemented at the kernel
level. Softdog has the limitation that it doesn't help if the kernel
itself hangs, which we don't really care about, but the advantage
that it works everywhere (including in VMs) and seems to be quite
reliable and predictable.
Some Linux distributions (such as Fedora) automatically load an
appropriate kernel watchdog module depending on what hardware is
available. Ubuntu 16.04 goes to the other extreme; it extensively
blacklists all kernel watchdog modules, softdog
included, so you
can't even stick something in /etc/modules-load.d
. To elide a
long discussion, our solution to this was a new cslab-softdog.service
systemd service that explicitly loaded the module using the following:
[Service] Type=oneshot RemainAfterExit=True ExecStart=/sbin/modprobe softdog
With softdog
loaded and ShutdownWatchdogSec
configured, systemd
appears to reliably reboot my test VMs and test hardware in situations
where systemd-shutdown
previously hung. It takes somewhat longer
than my configured ShutdownWatchdogSec
, apparently because softdog
gives you an extra margin of time just in case, probably 60 seconds
based on what modinfo
says.
Sidebar: Limiting total shutdown time (perhaps)
As noted in comments on my first entry on our reboot problems, reboot.target
and poweroff.target
both normally have a JobTimeoutSec
of 30 minutes. If my understanding
of systemd is correct, setting any JobTimeoutSec
here is supposed
to force a reboot or poweroff if the first stage of shutdown takes
that long (because rebooting is done by attempting to active
reboot.target
, which is a systemd 'job', which causes the job
timeout to matter).
Although I haven't tested it yet, this suggests that combining a
suitably short short JobTimeoutSec
on reboot.target
with
ShutdownWatchdogSec
would limit the total time your system will
ever spend rebooting. Picking a good JobTimeoutSec
value is not
obvious; you want it long enough that daemons have time to shut
down in an orderly way, but not so long that you go off to the
machine room. 30 minutes is clearly too long for us, but 30 seconds
would probably be too short for most servers.
|
|