How to shoot yourself in the foot with /etc/network/interfaces on Ubuntu

April 8, 2016

Today I had one of those self inflicted learning experiences that I get myself into from time to time. I will start with the summary and then tell you the story of how I did this to myself.

The summary is that errors in /etc/network/interfaces can cause your system to stall silently during boot for a potentially significant amount of time.

One sort of error is a syntax error or omitting a line. Another sort of error is accidentally duplicating an IP address between an interface's primary address and one of its aliases. If you do the latter, you will get weird errors in log files and from tools that don't actually help you.

How I discovered this is that today I was doing a test install of a new web server in a VM image. Our standard practice for web server hosts is that we don't make their hostname be the actual website name; instead they have a real hostname and then one or more website names as aliases. On most of our web servers, these are IP aliases. However, we're running short of IP addresses on our primary network and when I set up this new host I decided to make its single website just be another A record to its single IP address.

When I reached the end of the install process, I'd forgotten this detail; instead I thought the server needed the website name added as an IP alias. So I looked up the IP address for the website name and slavishly added to /etc/networks/interfaces something like:

auto eth0:0
     address <IP>
     netmask 255.255.255.0
     network <blah>.0

(The sharp eyed will notice that there are two errors here.)

Then I rebooted the machine and it just sat there for quite a while. After a couple of reboots and poking several things (eg, trying an older kernel) I wound up looking at interfaces in a rescue shell and noticed my silly mistake. Or rather, my obvious silly mistake: I'd left out the 'iface eth0:0 inet static' before the address et al. So I fixed that and rebooted the machine.

Imagine my surprise when the machine still hung during boot. But this time I let it sit for long enough that the Ubuntu boot process timed out whatever it needed to, and the machine actually came up. When it did, I poked around to try to find out what was wrong and eventually noticed that I had no eth0:0 alias device. This led me to notice that the IP address I was trying to give to eth0:0 was the same address that eth0 already had, at which point I finally figured out what was wrong and was able to fully correct it.

The good news is that now I know another place to look if an Ubuntu machine has mysterious 'hang during boot' problems. (Technically it was a stall, but stalling several minutes with no messages about it is functionally equivalent to a hang from the sysadmin perspective.)

(This is why I test my install instructions in virtual machines before going to the bother of getting real hardware set up. Sometimes it winds up feeling overly nitpicky, and sometimes very much not.)

Written on 08 April 2016.
« What is behind Unix's 'Text file is busy' error
Why your Ubuntu server stalls a while on boot if networking has problems »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Apr 8 01:49:30 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.