More on systemd on Ubuntu 16.04 failing to reliably reboot some of our servers

September 29, 2017

I wrote about how Ubuntu 16.04 can't reliably reboot some of our servers, then discovered that systemd can shut down the network with NFS mounts still present and speculated this was (and is) one of our problems. I've now been able to reliably produce such a reboot failure on a test VM and narrow down the specific component involved.

Systemd shuts down your system in two stages; the main stage that stops systemd units, and the final stage, done with systemd-shutdown, which kills the remaining processes, fiddles around with the remaining mounts, and theoretically eventually reboots the system. In the Ubuntu 16.04 version of systemd-shutdown, part of what it tries to do with NFS filesystems is to remount them read-only, and for us this sometimes hangs. With suitable logging enabled in systemd so that systemd-shutdown is run with it, we see:

Sending SIGTERM to remaining processes...
Sending SIGKILL to remaining processes...
Sending SIGKILL to PID <nnn> (<command>)
Unmounting file systems.
Remounting '/var/mail' read-only with options '<many of them>'.

At this point things hang, although if you have it set up a shutdown watchdog will force a reboot and recover the system. Based on comments on my second entry, systemd-shutdown doing this is (now) seen as a problem and it's been changed in the upstream version of systemd, although only very recently (eg this commit only landed at the end of August).

Unfortunately this doesn't seem to be the sole cause of our shutdown hangs. We appear to have had at least one reboot hang while systemd attempts to swapoff the server's swap space, before it enters late-stage reboot. This particular server has a lot of inactive user processes because it hosts our user-managed web servers, and (at the time) they weren't being killed early in system shutdown, so turning off swap space presumably had to page a lot of things back into RAM. This may not have actually hung as such, but if so it was sufficiently slow as to be unacceptable and we force-rebooted the server in question after a minute or two.

We're currently using multiple ways to hopefully reduce the chances of hangs at reboot times. We've put all user cron jobs into systemd user slices so that systemd will kill them early, although this doesn't always work and we may need some clever way of dealing with the remaining processes. We've enabled a shutdown watchdog timer with a relatively short timeout, although this only helps if the system makes it to the second stage when it runs systemd-shutdown; a 'hang' before then in swapoff won't be interrupted.

In the future we may enable a relatively short JobTimeoutSec on reboot.target, in the hopes that this does some good. I've considered changing Ubuntu's cron.service to KillMode=control-group and then holding the package to prevent surprise carnage during package upgrades, but this seems to be a little bit too much hassle and danger for an infrequent thing that is generally merely irritating.

As a practical matter, this entry is probably the end of the saga. This is not a particularly important thing for us and I've already discovered that there are no simple, straightforward, bug-free fixes (and as usual the odds are basically zero that Ubuntu will fix bugs here). If we're lucky, Ubuntu 18.04 will include a version of systemd with the systemd-shutdown NFS mount fixes in it and perhaps pam_systemd will be more reliable for @reboot cron jobs. If we're not lucky, well, we'll keep having to trek down to the machine room when we reboot servers. Fortunately it's not something we do very often.

Written on 29 September 2017.
« Putting cron jobs into systemd user slices doesn't always work (on Ubuntu 16.04)
Shell builtin versions of standard commands have drawbacks »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Sep 29 00:35:45 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.