How to shoot yourself in the foot with
/etc/network/interfaces on Ubuntu
Today I had one of those self inflicted learning experiences that I get myself into from time to time. I will start with the summary and then tell you the story of how I did this to myself.
The summary is that errors in
/etc/network/interfaces can cause
your system to stall silently during boot for a potentially significant
amount of time.
One sort of error is a syntax error or omitting a line. Another sort of error is accidentally duplicating an IP address between an interface's primary address and one of its aliases. If you do the latter, you will get weird errors in log files and from tools that don't actually help you.
How I discovered this is that today I was doing a test install of a new web server in a VM image. Our standard practice for web server hosts is that we don't make their hostname be the actual website name; instead they have a real hostname and then one or more website names as aliases. On most of our web servers, these are IP aliases. However, we're running short of IP addresses on our primary network and when I set up this new host I decided to make its single website just be another A record to its single IP address.
When I reached the end of the install process, I'd forgotten this
detail; instead I thought the server needed the website name added as
an IP alias. So I looked up the IP address for the website name and
slavishly added to
/etc/networks/interfaces something like:
auto eth0:0 address <IP> netmask 255.255.255.0 network <blah>.0
(The sharp eyed will notice that there are two errors here.)
Then I rebooted the machine and it just sat there for quite a while.
After a couple of reboots and poking several things (eg, trying an
older kernel) I wound up looking at
interfaces in a rescue shell
and noticed my silly mistake. Or rather, my obvious silly mistake:
I'd left out the '
iface eth0:0 inet static' before the
et al. So I fixed that and rebooted the machine.
Imagine my surprise when the machine still hung during boot. But
this time I let it sit for long enough that the Ubuntu boot process
timed out whatever it needed to, and the machine actually came up.
When it did, I poked around to try to find out what was wrong and
eventually noticed that I had no
eth0:0 alias device. This led
me to notice that the IP address I was trying to give to
was the same address that
eth0 already had, at which point I
finally figured out what was wrong and was able to fully correct
The good news is that now I know another place to look if an Ubuntu machine has mysterious 'hang during boot' problems. (Technically it was a stall, but stalling several minutes with no messages about it is functionally equivalent to a hang from the sysadmin perspective.)
(This is why I test my install instructions in virtual machines before going to the bother of getting real hardware set up. Sometimes it winds up feeling overly nitpicky, and sometimes very much not.)