Wandering Thoughts archives

2021-07-12

Problems in the way of straightforward device naming in operating systems

I've recently been writing (again) about Linux network interface names, this time what goes into udev's normal device names. This is a perennial topic in many operating systems; people are forever wanting straightforward and simple names for devices (networks, disk drives, and so on) and forever irritated that operating systems don't seem to be able to deliver this. Unix network device naming makes an illustrative example for everything that adds complexity even without hot-plugged devices.

Once upon a time the name of Unix Ethernet devices was simple; they were called eth0, eth1, eth2, and so on. This is an appealing naming scheme for networks, disk drives, and so on; you number the first Ethernet eth0 and go upward from there. The big problem with this naming scheme is the question of how you decide what is the first Ethernet, and in general how do you create an order for them and then keep it stable.

The minimum requirement for any naming scheme is that if the system is rebooted with no other changes, you get the same device names. In any operating system that probes and registers physical devices in parallel, this means you don't want to make the order be the order in which hardware is detected, because that might vary from reboot to reboot due to races. If one piece of hardware or one device driver is a little faster to respond this time around, you don't want what eth0 is to change. Operating systems could probe and register hardware one at a time, but this is unpopular because it can take a while and slow down reboots. Generally this means that you have to order devices based on either how they're connected to the system or some specific characteristics they have.

(The hardware changing how fast it responds may be unlikely with network interfaces, but consider disk drives.)

The next thing you want for a naming scheme is that existing devices don't have their names changed if you add or remove hardware. If you already have eth0 and eth1 and you add two new network cards (each with one interface), you want those two new interfaces to be eth2 and eth3 (in some order). If you later take out the card that is eth2, most people want eth3 to stay eth3. To make this case more tricky, if the card for eth2 fails and you replace it with an identical new card, most people want the new card's network interface to also be eth2, although it will of course have a different unique Ethernet hardware address.

Historically, some operating systems have attempted to implement this sort of long term stable device naming scheme by maintaining a registry of associations between device names and specific hardware. This creates its own set of problems, because now your replacement eth2 is most likely going to be eth4, but if you reinstall the machine from scratch it will be eth2 again. This leads to the third thing you want in a naming scheme, which is that if two machines have exactly the same hardware, they should have the same device names. Well, you may not want this, but system administrators definitely do.

Most modern general purpose computers use PCIe, even if they're not based on x86 CPUs. PCIe has short identifiers for devices that are stable over simple reboots in the form of PCIe bus addresses, but unfortunately it doesn't have short identifiers that are stable over hardware changes. Adding, removing, or changing PCIe hardware can insert or remove PCIe busses in the system's PCIe topology, which will renumber some other PCIe busses (busses that are enumerated after the new PCIe bus you've added). PCIe can have fully stable identifiers for devices, but they aren't short since you have to embed the entire PCIe path to the device.

It's possible to reduce and postpone these problems by expanding the namespace of device names. For instance, you can make the device names depend on the hardware driver being used, so instead of eth0 and eth1, you have ix0 and bge0. However, this doesn't eliminate the problem, since you can have multiple pieces of hardware that use the same driver (so you have ix0, ix1, and so on). It also makes life inconvenient for people, because now they have to remember what sort of hardware a given system has. A long time ago, Unix disk device names were often dependent on the specific hardware controller being used, but people found that they rather preferred only having to remember and use sda instead of rl0 versus rk0 versus rp0.

(The third problem is that your device names can change if you replace a network card with a different type of network card; you might go from bnx0 to bge0. It might even be from the same vendor. Intel has had several generations of networking chipsets, which are supported by different drivers on Linux. My office machine has two Intel 1G interfaces, one using the igb driver and one using the e1000e driver.)

tech/DeviceNamingProblems written at 23:28:24; Add Comment

Understanding something about udev's normal network device names on Linux

For a long time, systemd's version of udev has attempted to give network interfaces what the systemd people call predictable or stable names. The current naming scheme is more or less documented in systemd.net-naming-scheme, with an older version in their Predictable Netwwork Interface Names wiki page. To understand how the naming scheme is applied in practice by default, you also need to read the description of NamePolicy= in systemd.link(5), and inspect the default .link file, '99-default.link', which might be in either /lib/systemd/network or /usr/lib/systemd/network/. It appears that the current network name policy is generally going to be "kernel database onboard slot path", possibly with 'keep' at the front in addition. In practice, on most servers and desktops, most network devices will be named based on their PCI slot identifier, using systemd's 'path' naming policy.

A PCI slot identifier is what ordinary 'lspci' will show you as the PCIe bus address. As covered in the lspci manpage, the fully general form of a PCIe bus address is <domain>:<bus>:<device>.<function>, and on many systems the domain is always 0000 and is omitted. Systemd turns this into what it calls a "PCI geographical location", which is (translated into lspci's terminology):

prefix [Pdomain] pbus sdevice [ffunction] [nphys_port_name | ddev_port]

The domain is omitted if it's 0 and the function is only present if it's a multi-function device. All of the numbers are in decimal, while lspci presents them in hex. For Ethernet devices, the prefix is 'en'.

(I can't say anything about the 'n' and 'd' suffixes because I've never seen them in our hardware.)

The device portion of the PCIe bus address is very frequently 0, because many Ethernet devices are behind PCIe bridges in the PCIe bus topology. This is how my office workstation is arranged, and how almost all of our servers are. The exceptions are all on bus 0, the root bus, which I believe means that they're directly integrated into the core chipset. This means that in practice the network device name primarily comes from the PCI bus number, possibly with a function number added. This gives 'path' based names of, eg, enp6s0 (bus 6, device 0) or enp1s0f0 and enp1s0f1 (bus 1, device 0, function 0 or 1; this is a dual 10G-T card, with each port being one function).

(Onboard devices on servers and even desktops are often not integrated into the core chipset and thus not on PCIe bus 0. Udev may or may not recognize them as onboard devices and assign them 'eno<N>' names. Servers from good sources will hopefully have enough correct DMI and other information so that udev can do this.)

As always, the PCIe bus ordering doesn't necessarily correspond to what you think of as the actual order of hardware. My office workstation has an onboard Ethernet port on its ASUS Prime X370-Pro motherboard and an Intel 1G PCIe card, but they are (or would be) enp8s0 and enp6s0 respectively. So my onboard port has a higher PCIe bus number than the PCIe card.

There is an important consequence of this, which is that systemd's default network device names are not stable if you change your hardware around, even if you didn't touch the network card itself. Changing your hardware around can change your PCIe bus numbers, and since the PCIe bus number is most of what determines the network interface name, it will change. You don't have to touch your actual network card for this to happen; adding, changing, or relocating other hardware between physical PCIe slots can trigger changes in bus addresses (primarily if PCIe bridges are added or removed).

(However, adding or removing hardware won't necessarily change existing PCIe bus addresses even if the hardware changed has a PCIe bridge. It all depends on your specific PCIe topology.)

Sidebar: obtaining udev and PCIe topology information

Running 'udevadm info /sys/class/net/<something>' will give you a dump of what udev thinks and knows about any given network interface. The various ID_NET_NAME_* properties give you the various names that udev would assign based on that particular naming policy. The 'enp...' names are ID_NET_NAME_PATH, and on server hardware you may also see ID_NET_NAME_ONBOARD.

(The 'database' naming scheme comes from information in hwdb.)

On modern systems, 'lspci -PP' can be used to show the full PCIe path to a device (or all devices). On Ubuntu 18.04, you can also use sysfs to work through your PCIe topology, in addition to 'lspci -tv'. See also my entry on PCIe bus addresses, lspci, and working out your PCIe bus topology.

linux/UdevNetworkDeviceNaming written at 00:16:04; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.