Wandering Thoughts archives

2016-04-23

Why I think Illumos/OmniOS uses PCI subsystem IDs

As I mentioned yesterday, PCI has both vendor/device IDs and 'subsystem' vendor/device IDs. Here is what this looks like (in Linux) for a random device on one of our machines here (from 'lspci -vnn', more or less):

04:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0086] (rev 05)
Subsystem: Super Micro Computer Inc Device [15d9:0691]
[...]

This is the integrated motherboard SAS controller on a SuperMicro motherboard (part of our fileserver hardware). It's using a standard LSI chipset, as reported in the main PCI vendor and device ID, but the subsystem ID says it's from SuperMicro. Similarly, this is an Intel chipset based motherboard so there are a lot of things with standard Intel vendor and device IDs, but SuperMicro specific subsystem vendor and device IDs.

As far as I know, most systems use the PCI vendor and device IDs and mostly ignore the subsystem vendor and device IDs. It's not hard to see why; the main IDs tell you more about what the device actually is, and there are fewer of them to keep track of. Illumos is an exception, where much of the PCI information you see reported uses subsystem IDs. I believe that a significant reason for this is that Illumos is often attempting to basically fingerprint devices.

Illumos tries hard to have some degree of constant device naming (at least for their definition of it), so that say 'e1000g0' is always the same thing. This requires being able to identify specific hardware devices as much as possible, so you can tie them to the visible system-level names you've established. This is the purpose of /etc/path_to_inst and the systems associated with it; it fingerprints devices on first contact, assigns them an identifier (in the form of a driver plus an instance number), and thereafter tries to keep them exactly the same.

(From Illumos's perspective the ideal solution would be for each single PCI device to have a UUID or other unique identifier. But such a thing doesn't exist, at least not in general. So Illumos must fake a unique identifier by using some form of fingerprinting.)

If you want a device fingerprint, the PCI subsystem IDs are generally going to be more specific than the main IDs. A whole lot of very different LSI SAS controllers have 1000:0086 as their PCI vendor and device IDs, after all; that's basically the purpose of having the split. Using the SuperMicro subsystem vendor and device IDs ties it to 'the motherboard SAS controller on this specific type of motherboard', which is much closer to being a unique device identifier.

Note that Illumos's approach more or less explicitly errs on the side of declaring devices to be new. If you shuffle which slots your PCI cards are in, Illumos will declare them all to be new devices and force you to reconfigure things. However, this is broadly much more conservative than doing it the other way. Essentially Illumos says 'if I can see that something changed, I'm not going to go ahead and use your existing settings'. Maybe it's a harmless change where you just shuffled card slots, or maybe it's a sign of something more severe. Illumos doesn't know and isn't going to guess; you get to tell it.

(I do wish there were better tools to tell Illumos that certain changes were harmless and expected. It's kind of a pain that eg moving cards between PCI slots can cause such a commotion.)

IllumosWhyUsePCISubsystemIDs written at 02:46:54; Add Comment

2016-04-22

What Illumos/OmniOS PCI device names seem to mean

When working on an OmniOS system, under normal circumstances you'll use friendly device names from /dev and things like dladm (for network devices). However, Illumos-based systems have an underlying hardware based naming scheme (exposed in /devices), and under some circumstances you can wind up dealing with it. When you do, you'll be confronted with relatively opaque names like '/pci@0,0/pci8086,e04@2/pci8086,115e@0' and very little clue what these names actually mean, at least if you're not already an Illumos/Solaris expert.

So let's take just one bit here: pci8086,e04@2. The pci8086,e04 portion is the PCI subsystem vendor and device code, expressed in hex. You'll probably see '8086' a lot, because it's the vendor code for Intel. Then the @2 portion is PCI path information expressed relative to the parent. This can get complicated, because 'path relative to the parent' doesn't map well to the kinds of PCI names you get on Linux from eg 'lspci'. When you see a '@...' portion with a comma, that is what other systems would label as 'device.function'. If there is no comma in the '@..' portion, the function is implicitly 0.

(Note that the PCI subsystem vendor and device is different from the PCI vendor and device. Linux 'lspci -n' shows only the vendor code, because that's what's important for knowing what sort of thing it is instead of who exactly made it; you have to use 'lspci -vn' to see the subsystem stuff. Illumos's PCI names here are inherently framed as a PCI tree, whereas Linux lspci normally does not show the tree topology, just flat slot numbering. See 'lspci -t' for the tree view.)

As far as I can tell, in a modern PCI Express setup the physical slot you put a card into will determine the first two elements of the PCI path. '/pci@0,0' is just a (synthetic) PCI root instance, and then '/pci8086,e04@2' is a specific PCI Express Port. However, I'm not sure if one PCI Express Port can feed multiple slots and if it can, I'm not sure how you tell them apart. I'm not quite sure how things work for plain PCI cards, but for onboard PCI devices you get PCI paths like '/pci@0,0/pci15d9,714@1a' where the '@1a' corresponds to what Linux lspci sees as 00:1a.0.

So, suppose that you have a collection of OmniOS servers and you want to know if they have exactly the same PCI Express cards in exactly the same slots (or, say, exactly the same Intel 1G based network cards). If you look at /etc/path_to_inst and see exactly the same PCI paths, you've got what you want. If you look at the paths and see two systems with say:

s1: /pci@0,0/pci8086,e04@2/pci8086,135e@0
s2: /pci@0,0/pci8086,e04@2/pci8086,115e@0

What you have is a situation where the cards are in the same slots (because the first two elements of the path are the same) but they're slightly different generations and Intel has changed the PCI subsystem device code on you (seen in ',135e' versus ',115e'). If you're transplanting system disks from s2 to s1, this can cause problems that you'll need to deal with by editing path_to_inst.

I don't know what order Illumos uses when choosing how to assign instances (and thus eg network device names) to hardware when you have multiple instances of the same hardware. On a single card with multiple ports it seems consistent that the port with the lower function is assigned first, eg if you have a dual port card where the ports are pci8086,115e@0 and pci8086,115e@0,1, the @0 port will always be a lower instance than the @0,1 port. How multiple cards are handled is not clear to me and I can't reverse engineer it based on our current hardware.

(While we have multiple Intel 1G dual-port cards in our OmniOS fileservers, they are in PCI Express slots that differ both in the PCI subdevice and in the PCI path information; we have pci8086,e04@2 as the PCI Express Port for the first card and pci8086,e0a@3,2 for the second. I suspect that the PCI path information ('@2' versus '@3,2') determines things here, but I don't know for sure.)

PS: Yes, all of this is confusing (at least to me). Maybe I need to read up on general principles of PCI, PCI Express, and how all the topology stuff works (the PCI bus world is clearly not flat any more, if it ever was).

IllumosPCIDeviceNaming written at 02:02:19; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.