PCI slot based device names are not necessarily stable

February 26, 2014

One of the ways that Linux tries to get stable device names these days is to base them on information about the PCI bus and slot that a particular device is located at. This naming is behind, for example, hardware-based Ethernet names (see also) and /dev/disk/by-path/ for SATA and SAS drives. The theory is that since the name describes the PCI(E) location, as long as you don't physically relocate the card the name will stay the same. This is especially useful for things on the motherboard (because you can't move them at all).

The only problem is that this is not necessarily the case. There exists PC hardware where adding, changing, or removing other hardware will change the PCI bus and slot information for your hardware without you touching it at all; this even includes hardware located on the motherboard. Really. And the shifts aren't necessarily small, either. In the case I ran into today, changing from a dual port to a single port PCIE Gigabit card and moving it one card slot to the left changed two SAS disk controllers from PCI 07:00.0 and 08:00.0 to 04:00.0 and 05:00.0. Of course this totally changed how their disks came up in /dev/disk/by-path.

(For more fun, the new single-port Ethernet became 07:00.0 when the old two ports had been 05:00.0 and 06:00.0.)

The resulting reality is that your PCI based names are only stable if you change no hardware in the system. The moment you change any hardware all bets are off for all hardware. You may get lucky and have some devices keep their current PCI names but you may well not. And I don't think you're necessarily protected against perverse things like two equivalent devices swapping names (or at least one of them winding up with what was the other's old name).

If I'm reading lspci output correctly, what is really going on is that an increasing number of things are behind PCI bridges. These things create additional PCI buses (the first two digits in the PCI device numbering), and some combination of Linux, the system BIOS, and the PCI specification doesn't have a stable assignment for these additional busses. In fact since PCI(E) cards can themselves include additional bridges, a fully stable assignment would be very hard. This is part of what happened in my case; the old dual-port PCIE gigabit card contained not just two Ethernet controllers but two bridges as well (one for each controller) and these forcibly perturbed the numbering of other PCI 'busses' (which were really individual cards behind their own bridges).

PS: This has probably been the case for some time and this is just the first occasion I've run into it. We normally configure machines identically; it just so happened this time around that the first hardware unit we got in was used in part to test the dual-port card while the final unit configuration only needs a single-port card.

Comments on this page:

Yeah, at my work the application volumes for several years were VERITAS Volume Manager controlled because of the joys of Linux device name shuffles. This may be changing, but it was an issue that we dealt with...

By Ewen McNeill at 2014-02-27 19:00:56:

IIRC PCI(e) allows for nearly arbitrary PCI bridge (and hence PCI bus) nesting: the overall result is a tree, more than a bus. However the Linux device numbering code appears to be numbering buses linearly on an "as discovered" basis, rather than a topological basis (eg partial/total order breadth first tree walk, let alone with stable bus IDs): various kinds of hotplug (USB, etc) gets exciting for the same reason, since they can include bridges too...

I think "device names stable over a reboot" (with no hardware changes) is a win over the earlier races, but it's definitely not a complete victory. And I too have relied on, eg, LVM/volume labels for disk device discovery for years as a work around. (Ethernet is just painful: the default udev new-ethernet-device-for-each-MAC behaviour is typically worse than the problem it is curing, especially in single-ethernet devices like VMs with dynamically assigned MACs.)


By CWilson at 2014-03-12 04:40:19:

Try browsing the devices directory. e.g. /sys/class/net/eth0

In mine I found ~eth0/device/label then I did a cat label and saw Ethernet5. This attribute appears to come from CIM data. In my case, it's what the device was named by my VMWare Workstation, and remains static across cloning. Once you have that try:

udevadm info --attribute-walk --path=/sys/class/net/eth0

See whether that attribute is "attrib" or "ENV" and make your rule accordingly.

By cks at 2014-03-12 10:36:42:

Unfortunately almost no device in /sys on our machines has a label entry. In fact the one label on the particular machine with these SAS controllers appears to only have a label on one random PCIE Root Port.

By CWilson at 2014-03-12 14:07:37:

Run this and see if you see any vendor/model/serial info...

dmesg | grep -A4 scsi | less

Here's one I found on a web posted dmesg and it might look something like this:

scsi0: Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.31/3.2.4

       <Adaptec AIC-7850 SCSI host adapter>

scsi: 1 host. (scsi0:0:4:0) Synchronous at 5.0 Mbyte/sec, offset 8.

  Vendor: HP        Model: HP35480A          Rev: 1009
  Type:   Sequential-Access                  ANSI SCSI revision: 02

Then I would try this, as that one is "0:0:4:0"

udevadm info --attribute__walk --path=/sys/class/scsi__disk/0\:0\:4\:0

And see if you see any unique attributes.

For instance, when doing this with my AHCI compliant laptop the HDD appears as SCSI 0:0:0:0 with an attribute:


From at 2014-03-12 14:14:36:

Oh, guess I should ask: Are you running on any kind of hypervisor?

By cks at 2014-03-12 15:20:23:

This is running directly on physical hardware.

I looked at udevadm output and there's nothing there that I think is useful for this case. Certainly there is nothing there that an out of the box udev configuration uses to create location based disk drive names, which is the real issue.

By CWilson at 2014-03-12 17:16:53:

Well, I'm about out of ideas. TBH, udev drives me nuts too. The type and amount of CIM data it collects seems to be random. I've never seen any docs that highlight whether this comes from extended BIOS info, or if you need to have the Intel Systems Management Interface enabled. You might also need kernel drivers/modules or additional software to acquire additional attributes like "Sysfs Instrumentation" and/or "SMBIOS" highlighted here.

My last suggestion would be to use the udevadm monitor or to reload once the system has been up for a while and check for attribs then. If you find one then you could add a WAIT__FOR__SYSFS to your rules to hold assignment until that attrib populates.

Good Luck -CW

By cks at 2014-03-13 17:10:11:

I don't think this is fixable. As I wrote, PCI(E) enumeration is not a constant thing if hardware changes (even different hardware) and the cards don't particularly have unique identifiers (and certainly none that udev uses). That's just how it is.

By CWilson at 2014-03-14 12:55:12:

FYI - Just saw this while I was making a menuconfig...

ACPI Options -> ACPI Support -> PCI Slot Detection Driver

This driver creates entries in /sys/bus/pci/slots for ALL PCI slots in the system. This can help correlate PCI bus addresses, i.e., segment/bus/device/function tuples, with physical slots in the system.

Defined at drivers/acpi/Kconfig:250

I don't know if that will help, but if it maps out everything you have, empty or not, perhaps that will create some stability in the naming. Assuming of course, you're not already using this kmod and that the functionality has been implemented in either the udev or eudev projects.

By CWilson at 2014-03-16 02:31:54:

Lol, one more suggestion for you: disable udevd. I was under the impression you were forced to use a dynamic device enumerator under new kernels. Seems that isn't the case...

From: Devices for Servers

mount --bind / /mnt
cp -a /dev/* /mnt/dev
rm /etc/rc.d/rcS.d/{S10udev,S50udev_retry}
umount /mnt

After that you can either mknod for new devices or fire up udevd and copy only the new device to the underlying static dev.

If you already know this then... oops. I'm not trying to be Cap'n Obvious, I just try to pass on what I've learned myself when I see something.

By cks at 2014-03-16 15:25:16:

Disabling udev doesn't help because it isn't udev that is making the devices dynamic, it is the kernel (more specifically, it is that the kernel is using a linear approach to PCI enumeration instead of a tree based one).

By CWilson at 2014-03-18 23:16:59:

I found this on my current Kernel build and it may actually help. It is a 3.13.6 kernel, so I'm not sure if the settings are new or not.

Firmware Drivers -> Export DMI/SMBIOS Identification to Userspace via Sysfs. And, right below it: DMI Table Support in Sysfs. The latter makes raw DMI tables (CIM Data) visible in Sysfs. There's a 3rd option below that for iSCSI devices.

Also, Device Drivers -> SCSI Support -> SCSI Transports and there's several options to export transport information per device. Covers PSCSI, iSCSI, SAS, etc. The interesting thing is the code ID for SAS: SCSI_SAS_ATTRS.

Hopefully this is something new to help sort out just your situation.

By Zev Weiss at 2018-03-29 02:37:04:

Hear hear. I recently set about removing a video card from a machine that shouldn't have had a discrete one to start with (and for which the driver was causing occasional stability problems) only to abort and put it back in when I found that doing so caused a change in the so-called "consistent" name of its ethernet device, and putting the stupid graphics card back where it was was faster and easier than rooting through and updating everything in /etc with the new name it suddenly had. Maddening.

Written on 26 February 2014.
« Saying goodbye to the PHP pokers the easy way
Arguments for explicit block delimiters in programming languages »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 26 23:15:27 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.