Why your Ubuntu server stalls a while on boot if networking has problems

April 9, 2016

Yesterday I wrote on how to shoot yourself in the foot by making a mistake in /etc/network/interfaces. I kept digging into this today, and so now I can tell you why this happens and what you can do about it. The simple answer is that it comes from /etc/init/failsafe.conf.

What failsafe.conf is trying to do is kind of hard to explain without a background in Upstart (Ubuntu's 'traditional' init system). A real System V init system is always in a 'runlevel', and this drives what it does (eg it determines which /etc/rcN.d directory to process). Upstart sort of half abandons runlevels; they are not built into Upstart itself and some /etc/init jobs don't use them, but there's a standard Upstart event to set the runlevel and many /etc/init jobs are started and stopped based on this runlevel event. Let's simplify that: Upstart's runlevel stuff is a way of avoiding specifying real dependencies for /etc/init jobs and handling them for /etc/rcN.d scripts. Instead jobs can just say 'start on runlevel [2345]' and get started once the system has finished its basic boot processing, whatever that is and whatever it takes.

Since the Upstart runlevel is not built in, something must generate an appropriate 'runlevel N' event during boot at an appropriate time. That thing is /etc/init/rc-sysinit.conf, which in turn must be careful to run only at some appropriate point in Upstart's boot process, once this basic boot processing is done. When is basic boot processing done? Well, the rc-sysinit.conf answer is 'when filesystems are there and static networking is up', by in Upstart terms means when the filesystem(7) and static-network-up upstart events are emitted by something.

So what happens if networking doesn't come fully up, for instance if your /etc/network/interfaces has a mistake in it? If Upstart left things as they were, your system would just hang in early boot; rc-sysinit.conf would be left waiting for an Upstart event that would never happen. This is what failsafe.conf is there for. It waits a while for networking to come up, and if that doesn't happen it emits a special Upstart event that tells rc-sysinit.conf to go on anyways.

In the abstract this is a sensible idea. In the concrete, failsafe.conf has a number of problems:

  • the timeout is hardcoded, which means that it's guaranteed to be too long for some people and probably not long enough for others.

  • it doesn't produce any useful messages when it has to delay, and if you're not using Plymouth it's totally silent. Servers typically don't run Plymouth.

  • Upstart as a whole has a very inflexible view of what 'static networking is up' means. It apparently requires that every 'auto' interface listed in /etc/network/interfaces both exist and have link signal (have a cable plugged in and be connected to something); see eg this bug and this bug. You don't get to say 'proceed even without link signal' or 'this interface is optional' or the like.

For Ubuntu versions that use Upstart, you can fix this by changing /etc/init/failsafe.conf to shorten the timeouts and print out actual messages (anything you output with eg echo will wind up on the console). We're in the process of doing this locally; I opted to print out a rather verbose message for my usual reasons.

Of course, all of this is going to be inapplicable in the upcoming Ubuntu 16.04, since Ubuntu switched from Upstart to systemd as of 15.04 (cf). However Ubuntu has put something similar to failsafe.conf into their systemd setup and thus I expect that we'll wind up making similar modifications to it in some way.

(A true native systemd setup has a completely different and generally more granular way of handling failures to bring up networking, but I don't expect Ubuntu to make that big of a change any time soon.)

Written on 09 April 2016.
« How to shoot yourself in the foot with /etc/network/interfaces on Ubuntu
SPF is not a security feature, as it solves the wrong problem »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Apr 9 00:55:25 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.