More on systemd on Ubuntu 16.04 failing to reliably reboot some of our servers
I wrote about how Ubuntu 16.04 can't reliably reboot some of our servers, then discovered that systemd can shut down the network with NFS mounts still present and speculated this was (and is) one of our problems. I've now been able to reliably produce such a reboot failure on a test VM and narrow down the specific component involved.
Systemd shuts down your system in two stages;
the main stage that stops systemd units, and the final stage, done
with systemd-shutdown
,
which kills the remaining processes, fiddles around with the remaining mounts,
and theoretically eventually reboots the system. In the Ubuntu 16.04 version
of systemd-shutdown
, part of what it tries to do with NFS filesystems is
to remount them read-only, and for us this sometimes hangs. With suitable
logging enabled in systemd
so that
systemd-shutdown
is run with it, we see:
Sending SIGTERM to remaining processes... Sending SIGKILL to remaining processes... Sending SIGKILL to PID <nnn> (<command>) Unmounting file systems. Remounting '/var/mail' read-only with options '<many of them>'.
At this point things hang, although if you have it set up a shutdown
watchdog will force a reboot and recover
the system. Based on comments on my second entry, systemd-shutdown
doing this is (now)
seen as a problem and it's been changed in the upstream version of
systemd, although only very recently (eg this commit
only landed at the end of August).
Unfortunately this doesn't seem to be the sole cause of our shutdown
hangs. We appear to have had at least one reboot hang while systemd
attempts to swapoff
the server's swap space, before it enters
late-stage reboot. This particular server has a lot of inactive
user processes because it hosts our user-managed web servers, and (at the time) they weren't
being killed early in system shutdown, so
turning off swap space presumably had to page a lot of things back
into RAM. This may not have actually hung as such, but if so it was
sufficiently slow as to be unacceptable and we force-rebooted the
server in question after a minute or two.
We're currently using multiple ways to hopefully reduce the chances
of hangs at reboot times. We've put all user cron jobs into systemd
user slices so that systemd will kill
them early, although this doesn't always work
and we may need some clever way of dealing with the remaining
processes. We've enabled a shutdown watchdog timer with a relatively short timeout, although
this only helps if the system makes it to the second stage when it
runs systemd-shutdown
; a 'hang' before then in swapoff
won't
be interrupted.
In the future we may enable a relatively short JobTimeoutSec
on
reboot.target
, in the hopes that this does some good. I've
considered changing Ubuntu's cron.service
to KillMode=control-group
and then holding the package to prevent surprise carnage during
package upgrades, but this seems to be a little bit too much hassle
and danger for an infrequent thing that is generally merely irritating.
As a practical matter, this entry is probably the end of the saga.
This is not a particularly important thing for us and I've already
discovered that there are no simple, straightforward, bug-free fixes
(and as usual the odds are basically zero that
Ubuntu will fix bugs here). If we're lucky, Ubuntu 18.04 will include
a version of systemd with the systemd-shutdown
NFS mount fixes
in it and perhaps pam_systemd will be more reliable for @reboot
cron jobs. If we're not lucky, well, we'll keep having to trek down
to the machine room when we reboot servers. Fortunately it's not
something we do very often.
|
|