PCI slot based device names are not necessarily stable

February 26, 2014

One of the ways that Linux tries to get stable device names these days is to base them on information about the PCI bus and slot that a particular device is located at. This naming is behind, for example, hardware-based Ethernet names (see also) and /dev/disk/by-path/ for SATA and SAS drives. The theory is that since the name describes the PCI(E) location, as long as you don't physically relocate the card the name will stay the same. This is especially useful for things on the motherboard (because you can't move them at all).

The only problem is that this is not necessarily the case. There exists PC hardware where adding, changing, or removing other hardware will change the PCI bus and slot information for your hardware without you touching it at all; this even includes hardware located on the motherboard. Really. And the shifts aren't necessarily small, either. In the case I ran into today, changing from a dual port to a single port PCIE Gigabit card and moving it one card slot to the left changed two SAS disk controllers from PCI 07:00.0 and 08:00.0 to 04:00.0 and 05:00.0. Of course this totally changed how their disks came up in /dev/disk/by-path.

(For more fun, the new single-port Ethernet became 07:00.0 when the old two ports had been 05:00.0 and 06:00.0.)

The resulting reality is that your PCI based names are only stable if you change no hardware in the system. The moment you change any hardware all bets are off for all hardware. You may get lucky and have some devices keep their current PCI names but you may well not. And I don't think you're necessarily protected against perverse things like two equivalent devices swapping names (or at least one of them winding up with what was the other's old name).

If I'm reading lspci output correctly, what is really going on is that an increasing number of things are behind PCI bridges. These things create additional PCI buses (the first two digits in the PCI device numbering), and some combination of Linux, the system BIOS, and the PCI specification doesn't have a stable assignment for these additional busses. In fact since PCI(E) cards can themselves include additional bridges, a fully stable assignment would be very hard. This is part of what happened in my case; the old dual-port PCIE gigabit card contained not just two Ethernet controllers but two bridges as well (one for each controller) and these forcibly perturbed the numbering of other PCI 'busses' (which were really individual cards behind their own bridges).

PS: This has probably been the case for some time and this is just the first occasion I've run into it. We normally configure machines identically; it just so happened this time around that the first hardware unit we got in was used in part to test the dual-port card while the final unit configuration only needs a single-port card.

Written on 26 February 2014.
« Saying goodbye to the PHP pokers the easy way
Arguments for explicit block delimiters in programming languages »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 26 23:15:27 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.