Working to understand PCI Express and how it interacts with modern CPUs

October 13, 2017

PCI Express sort of crept up on me while I wasn't looking. One day everything was PCI and AGP, then there was some PCI-X in our servers, and then my then-new home machine had PCIe instead but I didn't really have anything to put in those slots so I didn't pay attention. With a new home machine in my future, I've been working to finally understand all of this better. Based on my current understanding, there are two sides here.

PCI Express itself is well described by the Wikipedia page. The core unit of PCIe connections is the lane, which carries one set of signals in either direction. Multiple lanes may be used together by the same device (or connection) in order to get more bandwidth, and these lane counts are written with an 'x' prefix, such as 'x4' or 'x16'. For a straightforward PCIe slot or card, the lane count describes both its size and how many PCIe lanes it uses (or wants to use); this is written as, for example 'PCIe x16'. It's also common to have a slot that's one physical size but provides fewer PCIe lanes; this is commonly written with two lane sizes, eg 'PCIe x16 @ x4' or 'PCIe x16 (x4 mode)'.

While a PCIe device may want a certain number of lanes, that doesn't mean it's going to get them. Lane counts are negotiated by both ends, which in practice means that the system can decide that a PCIe x16 graphics card in an x16 slot is actually only going to get 8 lanes (or less). I don't know if in theory all PCIe devices are supposed to work all the way down to one lane (x1), but if so I cynically suspect that in practice there are PCIe devices that can't or won't cope well if their lane count is significantly reduced.

(PCIe interconnection can involve quite sophisticated switches.)

All of this brings me around to how PCIe lanes connect to things. Once upon a time, the Northbridge chip was king and sat at the heart of your PC; it connected to the CPU, it connected to RAM, it connected to your AGP slot (or maybe a PCIe slot). Less important and lower bandwidth things were pushed off to the southbridge. These days, the CPU has dethroned the northbridge by basically swallowing it; a modern CPU directly connects to RAM, integrated graphics, and a limited number of PCIe lanes (and perhaps a few other high-importance things). Additional PCIe lanes, SATA ports, and everything else are connected to the motherboard chipset, which then connects back to the CPU through some interconnect. On modern Intel CPUs, this is Intel's DMI and is roughly equivalent to a four-lane PCIe link; on AMD's modern CPUs, this is apparently literally an x4 PCIe link.

Because you have to get to the CPU to talk to RAM, all PCIe devices that use non-CPU PCIe lanes are collectively choked down to the aggregate bandwidth of the chipset to CPU link for DMA transfers. Since SATA ports, USB ports, and so on are also generally connected to the chipset instead of the CPU, your PCIe devices are contending with them too. This is especially relevant with high-speed x4 PCIe devices such as M.2 NVMe SSDs, but I believe it comes up for 10G networking as well (especially if you have multiple 10G ports, where I think you need x4 PCIe 3.0 to saturate two 10G links).

(I don't know if you can usefully do PCIe transfers from one device to another device directly through the chipset, without touching the CPU and RAM and thus without having to go over the link between the chipset and the CPU.)

Typical Intel desktop CPUs have 16 onboard PCIe lanes, which are almost always connected to an x16 and an x16 @ x8 PCIe slot for your graphics cards. Current Intel motherboard chipsets such as the Z370 have what I've seen quoted as '20 to 24' additional PCIe lanes; these lanes must be used for M.2 NVMe drives, additional PCIe slots, and additional onboard chips that the motherboard vendor has decided to integrate (for example, to provide extra USB 3.1 gen 2 ports or extra SATA ports).

The situation with AMD Ryzen and its chipsets is more tangled and gets us into the difference between PCIe 2.0 and PCIe 3.0. Ryzen itself has 24 PCIe lanes to Intel's 16, but the Ryzen chipsets seem to have less additional PCIe lanes and many of them are slower PCIe 2.0 ones. The whole thing is confusing me, which makes it fortunate that I'm not planning to get a Ryzen-based system for various reasons, but for what it's worth I suspect that Ryzen's PCIe lane configuration is better for typical desktop users.

Unsurprisingly, server-focused CPUs and chipsets have more PCIe lanes and more lanes directly connected to the CPU or CPUs (for multi-socket configurations). Originally this was probably aimed at things like multiple 10G links and large amounts of high-speed disk IO. Today, with GPU computing becoming increasingly important, it's probably more and more being used to feed multiple x8 or x16 GPU card slots with high bandwidth.

Written on 13 October 2017.
« I'm looking forward to using systemd's new IP access control features
A surprise about which of our machines has the highest disk write volume »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Oct 13 02:10:23 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.