Wandering Thoughts archives

2021-11-09

Our new way of waiting for the network to be "up" in systemd's world

Systemd has a long standing philosophical objection to waiting until the network is up; they have an entire web page on the subject. Never the less, we need to do this (like many sysadmins). I've written before about this, and if you're using systemd-networkd either directly or through Ubuntu's netplan, you can in theory use systemd-networkd-wait-online.service. Usually it works, but today we discovered that it didn't on some of our Ubuntu 18.04 servers (the specifics of this issue are beyond the scope of this entry). Since we needed a way to fix the issue, we opted to solve our problem with a hammer.

All of our servers have static IP addresses and are always physically connected to the network unless something has gone terribly wrong. This means that for us, what's needed for the network to come up is for the kernel to probe the hardware for the network ports, then for some combination of Ubuntu's Netplan and systemd-networkd to run and configure the interfaces. We don't have to wait for DHCP or for a cable to be plugged in or any of the other things that can make it complicated to decide when the network is "up"; we simply need to wait for other local software to run and settle.

One general sign that all of this probing and configuring has happened is that at least one of the machine's interfaces has an IP(v4) address. So our brute force solution is a script that waits for this to be true, using a core loop of:

while true; do
   n="$(ip -br -4 addr list | grep -cv '^lo ')"
   if [ "$n" -gt 0 ]; then
      break;
   fi
   sleep 1
done

We don't care which interface has an IPv4 address, and we don't care how many do, or how many IPv4 addresses have been configured. Since all we're waiting for is for networkd to have run and set things up, we assume it will do all of this basically at the same time. This saves us having to do any parsing of the machine's Netplan or networkd configuration to see how many interfaces and IPv4 addresses there should be.

(If we thought there might be a little bit of a delay between the first IPv4 address being set and networkd finishing all of its configuration, we could add a sleep of a second or two at the end of the script.)

This script is then wired to a new systemd .service:

[Unit]
Description=...
Before=network-online.target
After=systemd-networkd.service systemd-networkd-wait-online.service

[Service]
Type=oneshot
RemainAfterExit=True
ExecStart=/opt/...
TimeoutStartSec=60s

[Install]
WantedBy=network-online.target

(The After= settings are there so that the script can spend less time sitting around twiddling its thumbs, and it's more likely that it will succeed on the first check without sleeping at all.)

To be safe, we time out (in the .service unit) after 60 seconds. It shouldn't ever take anywhere near 60 seconds to probe motherboard or PCIe card network hardware and then run networkd (and perhaps Netplan), so if we hit the timeout it's unlikely that waiting longer will change the situation on its own.

(This wouldn't work so neatly for IPv6; we'd have to distinguish between the automatically generated link-local fe80::/10 addresses and global IPv6 addresses that would be set by networkd.)

We could go further to wait for more indications of networking being fully available. Two obvious ones are for an interface with IPv4 addresses to also have carrier and for a default route to be set. But the latter assumes all of our hosts will always have a default route set (which may not always be the case) and the former is harder to test, so for now we've opted not to do either. In practice, networkd sets the default route very shortly after it sets the IP address on the interface, so there is not much of a window for our script to exist and then other .service units to start running without a default route.

(Also, most of our servers are on a single subnet and all of the boot-time services they need to talk to are on that subnet, so missing a default route usually won't stop us from doing things like synchronizing our time through NTP.)

linux/SystemdNetworkUpHammer written at 23:15:55; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.