Why I don't trust transitions to single-user mode

March 24, 2014

When I talked about how avoiding reboots should not become a fetish I mentioned that I trusted rebooting a server more than bringing it to single user mode and then back to multiuser. Today I feel like amplifying this.

The simple version is that it's easy for omissions to hide in the 'stop' handling of services if they are not normally stopped and restarted. When you reboot the machine after the 'stop' stuff runs, the reboot hides these errors. If you don't quite completely clean up /var/run or reset your state or whatever, well, rebooting the machine wipes all of that away and gives your 'start' scripts a clean slate. Similarly, there's potential issues in that transitioning from single user to multiuser mode doesn't have quite the same environment as booting the system or restarting a service in multiuser mode; bugs and omissions could lurk here too.

This is a specific instance of a general cautious view I have. There is nothing that forces a multiuser to single user to back to multiuser transition to be correct, since it's not done very often. Therefor I assume that there at least could be omissions. Of course these omissions are bugs, but that's cold comfort if things don't work right.

I also wouldn't be surprised if some services don't even bother to have real 'stop' actions. There are certainly some boot time actions that don't really have a clear inverse, and in general if you expect a service to never be restarted it's at least tempting to not go through all of the hassle. Perhaps I'm being biased by some of our local init service scripts which omit 'stop' actions for this reason.

(A related issue with single user mode is an increasing disagreement between various systems about just what services should be running in it. There was a day when single user mode just fsck'd the disks, mounted at least some local filesystems, and gave you a shell. Those days are long over; at this point any number of things may wind up running in order to provide what are considered necessary services.)


Comments on this page:

By Ewen McNeill at 2014-03-24 04:27:57:

I'd echo that observation: I haven't trusted multiuser to single user and back to multiuser transitions for at least a decade. Way, way, back they used to be reasonably reliable (eg, back when standard advice was to always go down to single user mode when patching, and return to multiuser mode afterwards). But I've been bitten by, eg, rpcbind/NFS not restarting properly over that transition (early Debian IIRC), so I'll generally reboot after getting down to single user: as you say "fresh boot through multiuser" is by far the best tested (and most optimised) path. About the only exception is hardware which is very slow to start (eg, mandatory 5+ minutes in BIOS/disk scans) where I'm fairly confident that the multiuser/singleuser/multiuser path on that machine is well tested. But usually anything critical has a failover partner, which hides most of that downtime from users anyway.

I expect one could make a local system/set of systems reliable for that path (multi/single/multi) given enough effort. But it doesn't seem worth it outside a cluster of (nearly) identical systems where the hardware boots slowly.

Ewen

Written on 24 March 2014.
« Differences in URL and site layout between static and dynamic websites
The importance of having full remote consoles on crucial servers »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Mar 24 02:50:59 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.