Modern Linux can require a link signal before it configures IP addresses

April 6, 2021

I recently had an interesting troubleshooting experience when an Ubuntu 18.04 Dell server would boot but not talk to the network, or in fact even configure its IP address and other networking. I was putting it into production in place of a 16.04 server, which meant I had changed its netplan configuration and recabled it (to reuse the 16.04 server's network wire). I spent some time examining the netplan configuration and trawling logs before I took a second look at the rear of the machine and realized that when I had shuffled the network cable around I had accidentally plugged it into the server's second network port instead of the first one.

What had fooled me about where the problem was that when I logged in to the machine on the console, ifconfig and ip both reported that the machine didn't have its IP address or other networking set up. Because I come from an era where networking was configured provided that the network device existed at all, that made me assume that something was wrong with netplan or with the underlying networkd configuration it generated. In fact what was going on is that these days nothing may get configured if a port doesn't have link signal. The inverse is also true; your full IP and network configuration may appear the moment you plug in a network cable and give the port link signal.

(I think this is due to netplan using systemd's networkd to actually handle network setup, instead of it being something netplan itself was doing.)

People using NetworkManager have been experiencing this for a long time, but I'm more used to static server network configurations that are there from the moment the server boots up and finds its network devices. This behavior is definitely something I'm going to have to remember for future troubleshooting, along with making sure that the network cable is plugged into the port it should be.

This does have some implications for what you can expect to happen if your servers ever boot without the switch they're connected to being powered on. In the past they would boot with IP networking fully configured but just not be able to talk to anything; now they'll boot without IP networking (and some things may wait for some time before they start, although not forever, since systemd's wait for the network to be online has a 120 second timeout by default).

(There may be some other implications if networkd also withdraws configured IP addresses when an interface loses link signal, for various reasons including someone unplugging the wrong switch port. I haven't tested this.)


Comments on this page:

From 193.219.181.219 at 2021-04-07 03:58:07:

The systemd-networkd option for customizing this behavior is ConfigureWithoutCarrier=, but it doesn't seem like Netplan exposes this at all.

IgnoreCarrierLoss= defaults to the same value as the above option.

By dozzie at 2021-04-07 04:08:18:

Even more reason to use Debian's ifupdown with predictable behaviour instead of fancy desktop-grade stuff.

Even if you use ifupdown (like I do) you can encounter missing link problems with IPv6, due to duplicate address detection, so you should turn that off if you need reliably configured static IPv6 addresses - details and longer story at https://www.dns.cam.ac.uk/news/2018-03-26-ipv6-dad-die.html

I have been very happy with netplan on servers. We are primarily an Ubuntu shop, but we converted our Debian systems too, for consistency.

The single biggest advantage of netplan is that it handles changes gracefully. For example, with ifupdown, you have to flap (down/up) the interface to add an IP address; with netplan you just “netplan apply” and it works. Of course, one can add it manually, but that is a second step and opens up more opportunities for mistakes. It is nice to know that you made the change through the same mechanism that will run on boot. Also, netplan can catch more configuration mistakes: If you ifdown the interface and then the ifup fails, you are broken (and possibly cut off). With netplan, it will print an error and change nothing. These factors also mean it integrates well with Ansible.

I also like that it uses a declarative syntax for everything, where with ifupdown you have to resort to running arbitrary commands in some cases. Granted, you can run arbitrary commands in netplan with a little extra work through its hook mechanism. We did have to do this for a while. But that was eventually eliminated by requesting a couple features be added to netplan. Canonical was good to work with on this. (I am a paying customer.)

I think this last one is a systemd-networkd thing, but I like the use of stable (per-install) generated MAC addresses for (probably only non-LACP) bonding interfaces. With ifupdown, it would use the MAC of one of the underlying interfaces, so you had a 50% chance that the MAC would change on reboot, which could break things if their ARP cache didn’t update.

ifupdown2 adds the "ifreload" command which applies the absolutely minimally disruptive set of changes. So you don't need to change your configs.

Written on 06 April 2021.
« A stable Unix updating its version of Go isn't straightforward
Rust's rustup tool is surprisingly nice and well behaved »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Apr 6 23:50:34 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.