Desktop motherboards can have fewer useful PCIe slots than they seem to

December 4, 2019

When I had to get an adapter card for my office machine's second NVMe drive, I worried (in advance) about the card itself slowing down the drive. It turns out that I should have also been concerned about a second issue, which is what PCIe slot to put it in. My current office machine uses an ASUS Prime X370-Pro motherboard, and if you look at it there are six PCIe slots; three of them are 'x1' single lane slots (which the manual calls PCIEX1_1 through 3), and three are physically x16 slots (called PCIEX16_1 through 3). In theory, this should make life simple; a good NVMe drive requires 4 PCIe lanes (written as 'x4'), and I have three slots that can do that.

Of course life is not so simple. For a start, just because a slot is physically a PCIe x16 slot doesn't mean that it supplies all 16 lanes (especially under all conditions), and basically no current motherboards actually provide three true x16 slots. Of the three x16 slots on this motherboard, the third is only PCIe x4 and the first two are only collectively x16; you can have one card at x16 or two cards at x8 (and it may be that only the first slot can do x16, the manual isn't entirely clear). The next issue is that the x16 @ x4 slot also shares PCIe lanes, this time with the second and third PCIe x1 slots. If you use a PCIe x1 card in either of the x1 slots, the x16 @ x4 slot becomes an x16 @ x2 slot. Finally, the first PCIe x1 slot is physically close enough to the first x16 slot that a dual width GPU card more or less precludes using it, which is unfortunate since that's the only PCIe x1 slot that doesn't conflict with the x16/x4 slot.

My office machine has a Radeon graphics card that happens to be dual width, an x1 Intel gigabit Ethernet card because I need a second network port, and now a PCIe NVMe adapter card that physically requires a PCIe x4 or greater slot and wants to be x4 to work best. The natural layout I used when I put the machine together initially was the Radeon graphics card in the first PCIe x16 slot and the Intel card in one of the two PCIe x1 slots where it would fit (I picked the third, putting it as far away from the Radeon as possible). When I added the NVMe card, I put it in the third PCIe x16 slot (which is physically below the third PCIe x1 slot with the Intel card); it seemed the most natural spot for it, partly because it kept room for air circulation for the fans of the Radeon card. Then I noticed that the second NVMe drive had clearly higher latency (especially write latency) than the first one, started looking, and discovered that it was running at PCIe x2 instead of x4 (because of the Intel Ethernet card).

If my graphics card could use x16 and I wanted it to, it might still be possible to make everything work at full speed, but I'd have to move the graphics card and hope that the second PCIe x16 slot can support full x16, not just x8. As it is, my card fortunately only wants x8, which means the simple resolution to my problem is moving the NVMe adapter card to the second PCIe x16 slot. If I wanted to also add a 10G-T Ethernet card, I might be out of luck, because I think those generally want at least x4.

(Our current 10G-T hardware turns out to be somewhat inconsistent on this. Our Intel dual 10G-T cards seem to want x8, but our Linux fileservers claim that their onboard 10G-T ports only want x1 with a link speed of 2.5GT/s.)

All of this is annoying, but the more alarming bit is that it's unlikely to be particularly obvious to people if their PCIe lane count is being reduced with cards like this PCIe to NVMe adapter card. It will still work, just more slowly than you'd expect, and then perhaps people write reviews saying 'this card is inferior and doesn't deliver full performance for NVMe drives'.

(This also omits the additional issue of whether the PCIe lanes in question are directly connected to the CPU or have to flow through the chipset, which has a limited bandwidth connection to the CPU. This matters on modern machines because you have to go through the CPU to get to RAM, so you can only get so much RAM bandwidth total from all PCIe devices behind the chipset, no matter how many PCIe lanes the chipset claims to provide (even after limitations). See also my older entry on PCIe and how it interacts with modern CPUs.)

Sidebar: PCIe 3.0 versus 2.0 in slots

The other issue in slots is which PCIe version they support, with PCIe 3.0 being potentially faster than PCIe 2.0. On my motherboard, only slots driven directly by the CPU support PCIe 3.0; slots driven through the AMD X370 chipset are 2.0 only. All of the PCIe x1 slots and the PCIe x16 @ x4 slot are driven by the chipset and so are PCIe 2.0, which may have been another source of my NVMe performance difference. The two full x16 slots are PCIe 3.0 from the CPU, as is the motherboard's M.2 slot.

Written on 04 December 2019.
« You can have Grafana tables with multiple values for a single metric (with Prometheus)
Looking into your system's PCIe slot topology and PCIe lane count under Linux »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Dec 4 01:04:01 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.