Exploring when the network is up on a machine

December 7, 2020

It's a common sysadmin desire to do things during (Unix) system boot 'once the network is up'. But what do we actually mean by that, in relatively concrete terms? The many possible meanings of this, and the varied environments we can ask the question in, are part of why this can be a tangled issue on Linuxes that use systemd.

To simplify life and because it's the main thing I care about, I'm going to consider only servers with a static set of network devices and fixed networking (at least to the level of DHCP on a specific interface, although we statically configure IP addresses). Even here, I think that there are actually two broad views of when the network is up, which I will call the server view and the client view.

For the server view, networking is up when the firewall rules have been loaded and all of the system's important IP addresses have been configured. This is the point where you can start daemons (or services) that bind to specific IPs and have them safely serve only the traffic you want them to see. You don't need interfaces to actually be up or for routes to be set, because if one of those isn't in place (or isn't working), to the outside world it's as if the machine hasn't booted yet.

For the client view, networking is up when (some) IP addresses have been set, interfaces are up with link signal, and routes are in place (on some systems you need firewall rules loaded too). This is when programs on the machine can reasonably talk to the outside world to make DNS queries, send log messages, check authorization status, fetch resources off the network, or whatever other important interactions they have. Often you don't want programs and services to run before they can do these important things because they'll just fail, sometimes in annoying ways.

As hinted here, some programs that provide network services also are network clients. For these programs, you often want both versions of 'the network is up' before they start. Even if a service works with just the server view, you probably want to have the system's time set correctly (from a NTP query) and to have logs go to your log host, not to have it running with the wrong time and logs going into the void because it can't connect to the log host.

Not all interfaces, IP addresses, and routes are necessarily required for either the server or client views on a particular machine (firewall rules are usually loaded as a unified whole). In some situations, this can even include the interface with the default route; this may be a public facing IP while all of your client-view things use other networks. Of course your service itself won't be 'up' to its clients until the public interface is alive, but you can start programs on the machine before then instead of having to stall.

Given all of this, it's not really surprising that Unix systems have struggled to provide useful versions of 'the network is up'. Any single definition of what that means is going to disappoint some people some of the time. Life is especially hard for people who have some interfaces and IPs that are critical and some that aren't; it's unlikely that any general purpose system will be able to automatically do what we want in that situation. Probably the best we can hope for is for a way to tell the system what we mean by 'the network is up' for this particular machine from among all of the components and options.

(This set of thoughts was sparked by reading Starting services only when the network is ready on Debian/systemd (via).)

PS: If your machines get their IP addresses via DHCP instead of static configuration, the server and client views start to collapse under normal circumstances. You can't have IP addresses for the server view unless you have link connectivity to at least the DHCP server, which often gets you most of the way to the client view too. But the price of this is that if you don't have link connectivity, you don't even have the server view of the network being up; instead, nothing gets anywhere. And on a multi-interface machine you get new and exciting modes of partial failure, where you can talk to one interface's DHCP server but not another's. Whether this results in a system with the server or client view of the network being up depends on what you need the interfaces for.

Written on 07 December 2020.
« The deprecation of FTP in browsers and its likely effects on search engines
CentOS's switch to Stream is a major change in what CentOS is »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Dec 7 23:06:33 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.