2021-07-12
Problems in the way of straightforward device naming in operating systems
I've recently been writing (again) about Linux network interface names, this time what goes into udev's normal device names. This is a perennial topic in many operating systems; people are forever wanting straightforward and simple names for devices (networks, disk drives, and so on) and forever irritated that operating systems don't seem to be able to deliver this. Unix network device naming makes an illustrative example for everything that adds complexity even without hot-plugged devices.
Once upon a time the name of Unix Ethernet devices was simple; they
were called eth0
, eth1
, eth2
, and so on. This is an appealing
naming scheme for networks, disk drives, and so on; you number the
first Ethernet eth0
and go upward from there. The big problem
with this naming scheme is the question of how you decide what is
the first Ethernet, and in general how do you create an order for
them and then keep it stable.
The minimum requirement for any naming scheme is that if the system
is rebooted with no other changes, you get the same device names.
In any operating system that probes and registers physical devices
in parallel, this means you don't want to make the order be the
order in which hardware is detected, because that might vary from
reboot to reboot due to races. If one piece of hardware or one
device driver is a little faster to respond this time around, you
don't want what eth0
is to change. Operating systems could probe
and register hardware one at a time, but this is unpopular because
it can take a while and slow down reboots. Generally this means
that you have to order devices based on either how they're connected
to the system or some specific characteristics they have.
(The hardware changing how fast it responds may be unlikely with network interfaces, but consider disk drives.)
The next thing you want for a naming scheme is that existing devices
don't have their names changed if you add or remove hardware. If
you already have eth0
and eth1
and you add two new network cards
(each with one interface), you want those two new interfaces to be
eth2
and eth3
(in some order). If you later take out the card
that is eth2
, most people want eth3
to stay eth3
. To make
this case more tricky, if the card for eth2
fails and you replace
it with an identical new card, most people want the new card's
network interface to also be eth2
, although it will of course
have a different unique Ethernet hardware address.
Historically, some operating systems have attempted to implement
this sort of long term stable device naming scheme by maintaining
a registry of associations between device names and specific hardware.
This creates its own set of problems, because now your replacement
eth2
is most likely going to be eth4
, but if you reinstall the
machine from scratch it will be eth2
again. This leads to the
third thing you want in a naming scheme, which is that if two
machines have exactly the same hardware, they should have the same
device names. Well, you may not want this, but system administrators
definitely do.
Most modern general purpose computers use PCIe, even if they're not based on x86 CPUs. PCIe has short identifiers for devices that are stable over simple reboots in the form of PCIe bus addresses, but unfortunately it doesn't have short identifiers that are stable over hardware changes. Adding, removing, or changing PCIe hardware can insert or remove PCIe busses in the system's PCIe topology, which will renumber some other PCIe busses (busses that are enumerated after the new PCIe bus you've added). PCIe can have fully stable identifiers for devices, but they aren't short since you have to embed the entire PCIe path to the device.
It's possible to reduce and postpone these problems by expanding
the namespace of device names. For instance, you can make the device
names depend on the hardware driver being used, so instead of eth0
and eth1
, you have ix0
and bge0
. However, this doesn't eliminate
the problem, since you can have multiple pieces of hardware that
use the same driver (so you have ix0
, ix1
, and so on). It also
makes life inconvenient for people, because now they have to remember
what sort of hardware a given system has. A long time ago, Unix
disk device names were often dependent on the specific hardware
controller being used, but people found
that they rather preferred only having to remember and use sda
instead of rl0
versus rk0
versus rp0
.
(The third problem is that your device names can change if you
replace a network card with a different type of network card; you
might go from bnx0
to bge0
. It might even be from the same
vendor. Intel has had several generations of networking chipsets,
which are supported by different drivers on Linux. My office
machine has two Intel 1G interfaces, one
using the igb
driver
and one using the e1000e
driver.)
Understanding something about udev's normal network device names on Linux
For a long time, systemd's version of udev has attempted to give
network interfaces what the systemd people call predictable or
stable names. The current naming scheme is more or less documented
in systemd.net-naming-scheme,
with an older version in their Predictable Netwwork Interface
Names
wiki page. To understand how the naming scheme is applied in
practice by default, you also need to read the description of
NamePolicy=
in systemd.link(5), and
inspect the default .link file, '99-default.link', which might be
in either /lib/systemd/network or /usr/lib/systemd/network/. It
appears that the current network name policy is generally going to
be "kernel database onboard slot path
", possibly with 'keep
'
at the front in addition. In practice, on most servers and desktops,
most network devices will be named based on their PCI slot identifier,
using systemd's 'path
' naming policy.
A PCI slot identifier is what ordinary 'lspci
' will show you as
the PCIe bus address. As covered in the lspci
manpage, the fully
general form of a PCIe bus address is <domain>:<bus>:<device>.<function>,
and on many systems the domain is always 0000 and is omitted. Systemd
turns this into what it calls a "PCI geographical location", which is
(translated into lspci's terminology):
prefix [Pdomain] pbus sdevice [ffunction] [nphys_port_name | ddev_port]
The domain is omitted if it's 0 and the function is only present
if it's a multi-function device. All of the numbers are in decimal,
while lspci
presents them in hex. For Ethernet devices, the prefix
is 'en
'.
(I can't say anything about the 'n
' and 'd
' suffixes because
I've never seen them in our hardware.)
The device portion of the PCIe bus address is very frequently 0, because many Ethernet devices are behind PCIe bridges in the PCIe bus topology. This is how my office workstation is arranged, and how almost all of our servers are. The exceptions are all on bus 0, the root bus, which I believe means that they're directly integrated into the core chipset. This means that in practice the network device name primarily comes from the PCI bus number, possibly with a function number added. This gives 'path' based names of, eg, enp6s0 (bus 6, device 0) or enp1s0f0 and enp1s0f1 (bus 1, device 0, function 0 or 1; this is a dual 10G-T card, with each port being one function).
(Onboard devices on servers and even desktops are often not integrated
into the core chipset and thus not on PCIe bus 0. Udev may or may
not recognize them as onboard devices and assign them 'eno<N>
'
names. Servers from good sources will hopefully have enough correct
DMI and other information so that udev can do this.)
As always, the PCIe bus ordering doesn't necessarily correspond to what you think of as the actual order of hardware. My office workstation has an onboard Ethernet port on its ASUS Prime X370-Pro motherboard and an Intel 1G PCIe card, but they are (or would be) enp8s0 and enp6s0 respectively. So my onboard port has a higher PCIe bus number than the PCIe card.
There is an important consequence of this, which is that systemd's default network device names are not stable if you change your hardware around, even if you didn't touch the network card itself. Changing your hardware around can change your PCIe bus numbers, and since the PCIe bus number is most of what determines the network interface name, it will change. You don't have to touch your actual network card for this to happen; adding, changing, or relocating other hardware between physical PCIe slots can trigger changes in bus addresses (primarily if PCIe bridges are added or removed).
(However, adding or removing hardware won't necessarily change existing PCIe bus addresses even if the hardware changed has a PCIe bridge. It all depends on your specific PCIe topology.)
Sidebar: obtaining udev and PCIe topology information
Running 'udevadm info /sys/class/net/<something>
' will give you
a dump of what udev thinks and knows about any given network
interface. The various ID_NET_NAME_*
properties give you the
various names that udev would assign based on that particular
naming policy. The 'enp...' names are ID_NET_NAME_PATH
, and
on server hardware you may also see ID_NET_NAME_ONBOARD
.
(The 'database
' naming scheme comes from information in
hwdb.)
On modern systems, 'lspci -PP
' can be used to show the full PCIe
path to a device (or all devices). On Ubuntu 18.04, you can also
use sysfs to work through your PCIe topology,
in addition to 'lspci -tv
'. See also my entry on PCIe bus
addresses, lspci
, and working out your PCIe bus topology.