Wandering Thoughts archives

2017-09-13

System shutdown is complicated and involves policy decisions

I've been a little harsh lately on how systemd has been (not) shutting down our systems, and certainly it has some issues and it could be better. But I want to note that in general and in practice, shutting down a Unix system is a complicated thing that involves tradeoffs and policy decisions; in fact I maintain that it's harder than booting the system. Further, the more full-featured you attempt to make system shutdown, the more policy decisions and tradeoffs you need to make.

(The only way to make system shutdown simple is to have a very minimal view of it and to essentially crash the running system, as original BSD did. This is a valid choice and certainly systems should be able to deal with abrupt crashes, since they do happen, but it isn't necessarily a great one. Your database can recover after a crash-stop, but it will probably be happier if you let it shut down neatly and it may well start faster that way.)

One of the problems that makes shutdown complicated is that on the one hand, stopping things can fail, and on the other hand, when you shut down the system you want and often need for it to actually go down, so overall system shutdown can't fail. Reconciling these conflicting facts requires policy decisions, because there is no clear universal technical answer for what you do if a service shutdown fails (ie the service process or processes remain running), or a filesystem can't be unmounted, or some piece of hardware says 'no, I am not detaching and shutting down'. Do you continue on with the rest of the shutdown process and try again later? Do you start killing processes that might be holding things busy? What do you do about your normal shutdown ordering requirements, for example do you block further services and so on from shutting down just yet, or do you continue on (and perhaps let them make their own decisions about whether they can shut down)?

There are no one size fits all answers to these questions and issues, especially if the init system is essentially blind to the specific nature of the services involved and treats them as generic 'services' with generic 'shutdown' actions. Even in an init system where the answers to these questions can be configured on a per-service or per-item basis, someone has to do that configuration and get it right (which may be complicated by an init system that doesn't distinguish between the different contexts of stopping a specific service, which means that you get to pick your poison).

While it's not trivial, it's not particularly difficult for an init system to reliably shut down machines if and when all of the individual service and item shutdowns go fine and all of the dependencies are fully expressed (and correct), so that everything is stopped in the right order. But this is the easy case. The hard case for all init systems is when something goes wrong, and many init systems have historically had issues here.

(Many implementations of System V init would simply stall the entire system shutdown if an '/etc/init.d/<whatever> stop' operation hung, for example.)

PS: One obvious pragmatic question and problem is how and when you give up on an orderly shutdown of a service and (perhaps) switch over to things like killing processes. Services may legitimately take some time to shut down, in order to flush out data, close databases properly, and so on, but they can also hang during shutdown for all sorts of reasons. This is especially relevant in any init system that shuts down multiple services in parallel, because each service being shut down could suddenly want a bunch of resources.

(One of the fun cases is where you have heavyweight daemons that are all inactive and paged out of RAM, and you ask them to do an orderly shutdown, which suddenly causes everything to try to page back in to your limited RAM. I've been there in a similar situation.)

unix/ShutdownComplicated written at 01:47:11; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.