Why your Ubuntu server stalls a while on boot if networking has problems
Yesterday I wrote on how to shoot yourself in the foot by making
a mistake in
I kept digging into this today, and so now I can tell you why this
happens and what you can do about it. The simple answer is that it
failsafe.conf is trying to do is kind of hard to explain
without a background in Upstart (Ubuntu's 'traditional' init system).
A real System V init system is always in a 'runlevel', and this
drives what it does (eg it determines which
to process). Upstart sort of half abandons runlevels; they are not
built into Upstart itself and some
/etc/init jobs don't use them,
but there's a standard Upstart event to set the runlevel and
/etc/init jobs are started and stopped based on this runlevel
Let's simplify that: Upstart's runlevel stuff is a way of avoiding
specifying real dependencies for
/etc/init jobs and handling them
/etc/rcN.d scripts. Instead jobs can just say '
runlevel ' and get started once the system has finished its
basic boot processing, whatever that is and whatever it takes.
Since the Upstart runlevel is not built in, something must generate
an appropriate 'runlevel N' event during boot at an appropriate
time. That thing is
/etc/init/rc-sysinit.conf, which in turn
must be careful to run only at some appropriate point in Upstart's
boot process, once this basic boot processing is done. When is basic
boot processing done? Well, the
rc-sysinit.conf answer is 'when
filesystems are there and static networking is up', by in Upstart
terms means when the
static-network-up upstart events
are emitted by something.
So what happens if networking doesn't come fully up, for instance
/etc/network/interfaces has a mistake in it? If Upstart
left things as they were, your system would just hang in early boot;
rc-sysinit.conf would be left waiting for an Upstart event that
would never happen. This is what
failsafe.conf is there for. It
waits a while for networking to come up, and if that doesn't happen
it emits a special Upstart event that tells
go on anyways.
In the abstract this is a sensible idea. In the concrete,
has a number of problems:
- the timeout is hardcoded, which means that it's guaranteed to
be too long for some people and probably not long enough for
- it doesn't produce any useful messages when it has to delay,
and if you're not using Plymouth
it's totally silent. Servers typically don't run Plymouth.
- Upstart as a whole has a very inflexible view of what 'static
networking is up' means. It apparently requires that every 'auto'
interface listed in
/etc/network/interfacesboth exist and have link signal (have a cable plugged in and be connected to something); see eg this bug and this bug. You don't get to say 'proceed even without link signal' or 'this interface is optional' or the like.
For Ubuntu versions that use Upstart, you can fix this by changing
/etc/init/failsafe.conf to shorten the timeouts and print out
actual messages (anything you output with eg
echo will wind up
on the console). We're in the process of doing this locally; I
opted to print out a rather verbose message for my usual reasons.
Of course, all of this is going to be inapplicable in the upcoming
Ubuntu 16.04, since Ubuntu switched from Upstart to systemd as of
However Ubuntu has put something similar to
into their systemd setup and thus I expect that we'll wind up making
similar modifications to it in some way.
(A true native systemd setup has a completely different and generally more granular way of handling failures to bring up networking, but I don't expect Ubuntu to make that big of a change any time soon.)