Wandering Thoughts archives

2014-03-17

Rebooting the system if init dies is a hack

I feel like I should say this explicitly: rebooting the system if init dies is a hack. It's the easy thing to do but not the right thing. V7 Unix more or less ignored the possibility of init failing; when BSD started considering this situation they took the easy way out of 'handling' it by just rebooting the system. Everyone since then has copied BSD (probably partly out of compatibility, since 'everyone knows' that if init dies the system reboots, and partly because it's the easy way).

You can argue that if init dies something terrible is going on (especially after the kernel has armored init so that you have to work very hard to terminate it) and this is generally true. But rebooting the system is the lazy way out, especially when this is determined by the kernel instead of user level. It might certainly be sensible to configure your system to immediately start a reboot if init ever dies and is restarted by the kernel, but at that point it's something you control at user level; you might instead ring lots of alarms and see if the system could limp on. And so on. From some perspectives, 'reboot the system if init dies' is the kernel meddling in policy that should be left to other levels.

The right thing is to provide some way to recover from this situation. I outlined two plausible approaches yesterday; there are probably more. Of course this is more work to design and program than just rebooting the machine, but that's common when you do the right thing instead of the easy thing.

It's kind of sad that almost everyone since BSD has simply followed or copied the BSD quick hack approach (even the people who reimplement things from scratch, like Linux) but this is pretty typical for Unix. If some Unix did try to do it differently I suspect that there would be people complaining that that Unix was over-complicating init.

InitDeathAndRebootsII written at 01:33:05; Add Comment

2014-03-16

You don't have to reboot the system if init dies

One of the thing that makes PID 1 special on many systems is that if it ever exits or dies for any reason, the system will reboot. This behavior was introduced by BSD Unix (V7 ignored the possibility) and makes a certain amount of sense; init is crucial both for reaping orphan processes and restarting serial port logins. If it goes away, rebooting the system is an easy way to hopefully fix the situation.

However, this behavior is not set in stone. There are several alternatives. The first would be to simply have the kernel cope with no PID 1, handling and reaping orphan processes itself internally in some way (and possibly providing some special way for user level to restart a new PID 1). The second is for the kernel to re-exec init as PID 1 if necessary. If PID 1 exits, the kernel would not tear down its process but instead act as if it had done an exec. Ideally this would be accompanied by some way for init to store and then reload important state. Done right this actually provides a great way for init to transition itself into a new version; just record the current state, exit, and let the kernel re-exec the new init binary.

Perhaps the second behavior sounds odd and crazy. Then I should probably tell you that this is current Solaris behavior and nothing seems to have exploded as a result. In other words we already have an existence proof that it's possible to change the semantics of PID 1 exiting, so we could adopt it elsewhere if desired.

Apart from the innate conservatism of Unixes, I think one reason that other Unixes haven't done this is that it's almost never necessary anyways. Since init not exiting is so crucial today people have devoted a lot of engineering effort to make sure that it doesn't happen and have been quite successful at it. Even radically different and complex systems like Upstart and systemd have been extremely stable this way in practice.

(Also, this 're-exec init on failure' behavior needs cooperation from your init, both so that init doesn't always start trying to boot the system when it's executed and so that it journals state periodically so that a new init can pick it up again. This makes it easier to add in certain sorts of Unixes, ie the ones where one team can control both kernel changes and init changes.)

InitDeathAndReboots written at 00:47:56; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.