Systemd on Ubuntu 16.04 can't (or won't) reliably reboot your server

September 6, 2017

We just went through a periodic exercise of rebooting all of our Ubuntu servers in order to get up to date on kernels and so on. By now almost all of our servers are running Ubuntu 16.04, which means that they're using systemd. Unfortunately this gives us a real problem, because on Ubuntu 16.04, systemd won't reliably reboot your system. On some servers, usually the busiest and most important ones, the system will just stop during the shutdown process and sit there. And sit there. And sit there. Perhaps it would eventually recover after tens of minutes, but as mentioned these are generally our busiest and most important servers, so we're not exactly going to let them sit there to find out what happens eventually.

(There also probably isn't much point to finding out. It's unlikely that there's some miracle cure we can do ourselves, and making a bug report to Ubuntu is almost completely pointless since Ubuntu only fixes security issues and things that are actively on fire. My previous experience wasn't productive and produced no solutions from anyone.)

This goes well beyond my previous systemd reboot irritation. Reliably rebooting servers despite what users are doing to them is a fairly foundational thing, yet Ubuntu's systemd not only can't get this right but doesn't even tell us what's wrong (in the sense of 'what is keeping me from rebooting'). The net effect is to turn rebooting many of our servers into a minefield. Not only may a reboot require in-person intervention in our machine room, but that we can't count on a reboot just working means that we actively have to pay attention to the state of every machine when we reboot them; we can't just assume that machines will come back up on their own unless something is fairly wrong. The whole experience angers me every time I have to go through it.

By now we've enabled persistent systemd journals on most everything in the hopes of capturing useful information so we can perhaps guess why this is happening. Unfortunately so far we've gotten nothing useful; systemd has yet to log or display on the screen, say, 'still waiting N seconds for job X'. I'm not even convinced that the systemd journal has captured all of the log messages that it should from an unsuccessful shutdown, as what 'journalctl -b-1' shows is much less than I'd expect and just stops abruptly.

(Without an idea of how and why systemd is screwing up, I'm reluctant to change DefaultTimeoutStopSec from its Ubuntu default, as I once discussed here, or make other changes like forcing all user cron jobs to run under user slices.)

(This Ubuntu bug matches one set of symptoms we see, but not all of them. Note that our problem is definitely not the Linux kernel having problems rebooting the hardware; the same Dell servers were previously running Ubuntu 14.04 and rebooting fine, and Magic SysRQ will force reboots without problems. There's also this Ubuntu bug and this report of problems with shutting down when you have NFS mounts, which certainly could be part of our problems.)


Comments on this page:

It might not be the only issue, but I was working on systemd recently and I remember seeing this PR come in

The remount read-only and subsequent umount operations are currently not limited. As a result, the shutdown operation can stall endlessly due to a inaccessible NFS mounts, or a number of similar factors. This results in a manual system reset being necessary.

With these changes, the remount is now limited to a maximum of 6 attempts (UMOUNT_MAX_RETRIES + 1). In addition, the remount operation has been moved to a separate child process that is limited in duration. Each remount operation is limited to 90 seconds (DEFAULT_TIMEOUT_USEC) before the child process exits with a SIGALRM and reports the failure.

https://github.com/systemd/systemd/pull/6598

Also some other comments in the issue

https://github.com/systemd/systemd/issues/6115

Looking at sysvinit on Debian I can't immediately see any timeouts for unmounting. It has a nice sequence

  • K03sendsigs
  • K04rsyslog
  • K05hwclock.sh
  • K05umountnfs.sh
  • K06networking
  • K07umountfs
  • K08lvm2
  • K09umountroot
  • K10halt

The main difference is that, as you've noticed, systemd doesn't have a `sendsigs` unit.

If you have any user processes that survive... including any SSH session?? (they aren't affected by getty units, or the xdm unit, unlike other types of session, and they're deliberately supposed not to be affected by the SSH unit,)... The user processes survive service bringdown (shutdown.target), the mount units they pin will "fail" (not stop cleanly, I think; no timeout delay), and then you're reliant on the crude systemd-shutdown logic. But you've stopped the networking service, so I guess NFS mounts get cranky at that point.

In summary, this one might be sufficiently widely infuriating that you'll eventually see some work on it merged upstream.

It seems a clear victim of code churn (and people giving up and/or completely failing to understand the previous design).

It would be interesting know how Debian's handled this. I know they're even less likely to backport fixes. It just that their discussions work a bit differently; there might be less noise about possibly unrelated hardware issues. (I've seen a developer point out that their tracker effectively raises the bar on who will even try to report issues, reducing their workload).

We have the same problem here, although on CentOS 7 VMs (ESXi) with systemd. They are very unreliable when rebooting after our periodic patch cycles and so far we have found no remedy.

What we've found out is that it only affects busy systems and is somehow (possibly?) connected to the swap partition not being able to unmount quickly enough or being locked somehow. We even went as far as defining custom runlevels (aka targets) that encompass all installed software ofter the blank OS setup. Then our reboot process would first go through shutdown of this software (webservers, DBs etc.) before actually attempting to reboot. It was our hope that swap is then sufficiently empty/unused and systemd wont hang, but it has proven very difficult to even test this, because it is such an inconsistent bug.

By Dan.Astoorian at 2017-09-06 10:32:03:

For what it's worth, I've also experienced this a few times with one of our CentOS7 servers (a Dell PowerEdge R430).

On one of these instances earlier this summer, I left it alone when it failed to reboot, and it eventually did reboot on its own after almost exactly 30 minutes. (I was being particularly patient on that occasion because the reboot was for a BIOS update.)

The server was an NFS and CIFS client (but not a network fileserver), so it's plausible that the delay was related to one of those services, but I really had no way of evaluating what systemd might have been waiting for. It's also a web server, but I can't think of anything Apache could be doing to tie things up.

I spent a short time hunting for 30-minute timeouts in the service definitions, but didn't find anything promising; if anyone knows what takes 30 minutes for systemd and/or the kernel to give up on, that might be a clue.

--Dan

By K.C. Marshall at 2017-09-06 13:58:48:

Perhaps you could activate a hardware watchdog to kick the machine at the BIOS level if the reboot doesn't finish in 5 or 10 minutes. Another option might be to use the IPMI/idrac interface to power cycle the machine when the reboot takes too long to finish. The solution currently is a hard boot, so a watchdog or ipmi triggered reboot is not much different. There is still some need to monitor the machine to know if it is supposed to be kicked harder.

By D.F. at 2017-09-06 17:49:31:

I just have to ask...after all the crapulence that systemd has introduced into the linux world, why do people continue to indulge in this horrible software? Why are there excuses made for it? Its microsoft of yesteryear -- 'well, sure...it messed up on something easy and erased all my data, but I love it!'

By Chris Adams at 2017-09-06 19:36:50:

D.F. if this turns out to be related to filesystem unmounting, note that the same problem occurs with Upstart and SysV. There's no simple answer which satisfies everyone: a hard unmount loses data and waiting too long is effectively a denial of service. The underlying fix would be hardening NFS so it wouldn't fail so easily into an unresponsive state.

From 12.34.36.250 at 2017-09-07 09:33:17:

I don't know, Chris - that seems like more blame shifting. Every time there's a problem with systemd, the response is 'well...its not really systemd's fault, but something else. The best response is to fix the underlying problem with the foobar service' and that really seems to kick up the angst. At some point, deserved or not, systemd needs to actually own the process of starting, stopping, and rebooting the system without having to rely on fingerpointing.

But that's the gripe some of us have -- systemd is under active development -- its a large, complex piece of software that we're trusting to do the right thing and its failing in many cases. It seems like it could've used a few more years of development to not feel like we're beta testing it.

By Tiago at 2017-09-07 11:45:36:

Did you try https://freedesktop.org/wiki/Software/systemd/Debugging/#shutdowncompleteseventually?

You could also open a bug against systemd in Ubuntu with the steps you have taken and logs collected so far.

Written on 06 September 2017.
« If spam false positives are inevitable, should we handle them better?
Systemd, NFS mounts, and shutting down your system »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Sep 6 02:27:09 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.