PCIe slot bandwidth can change dynamically (and very rapidly)

December 18, 2019

When I added some NVMe drives to my office machine and started looking into its PCIe setup, I discovered that its Radeon graphics card seemed to be operating at 2.5 GT/s (PCIe 1.0) instead of 8 GT/s (PCIe 3.0). The last time around, I thought I had fixed this just by poking into the BIOS, but in a comment, Alex suggested that this was actually a power-saving measure and not necessarily done by the BIOS. I'll quote the comment in full because it summarizes things better than I can:

Your GPU was probably running at lower speeds as a power-saving measure. Lanes consume power, and higher speeds consume more power. The GPU driver is generally responsible for telling the card what speed (and lane width) to run at, but whether that works (or works well) with the Linux drivers is another question.

It turns out that Alex is right, and what I saw after going through the BIOS didn't quite mean what I thought it did.

To start with the summary, the PCIe bandwidth being used by my graphics card can vary very rapidly from 2.5 GT/s up to 8 GT/s and then back down again based on whether or not the graphics driver needs the card to do anything (or the aggregate Linux and X software stack as a whole, since I don't know where these decisions are being made). The most dramatic and interesting difference is between two apparently very similar ways of seeing if the Radeon's bandwidth is currently downgraded, either automatically scanning through lspci's output with 'lspci -vv | fgrep downgrade' or manually looking through it with 'lspci -vv | less'. When I used less, the Radeon normally showed up downgraded to 2.5 GT/s. When I used fgrep, other things before the Radeon showed up as downgraded but the Radeon never did; it was always at 8 GT/s.

(Some of those other things have been downgraded to 'x0' lanes, which I suspect means that they've been disabled as unused.)

What I think is happening here is that when I pipe lspci to less, lspci gets the Radeon's bandwidth before any output is written to the screen (less reads it all in a big gulp and then displays it), so at the time the graphics chain is inactive. When I use the fgrep pipe, some output is written to the screen before lspci gets to the Radeon and so the graphics chain lights up the Radeon's bandwidth to display things. What this suggests is that the graphics chain can and does vary the Radeon's PCIe bandwidth quite rapidly. Another interesting case is that running the venerable glxgears doesn't bring the PCIe bandwidth up from 2.5 GT/s, but running GpuTest's 'fur' test does (it goes to 8 GT/s as you might expect).

(It turns out that nVidia's Linux drivers also do this.)

Of course all of this may make seeing whether you're getting full PCIe bandwidth a little bit interesting. It's clearly not enough to just look at your system, even when it's moderately active (I have several X programs that update once a second); you really need to put it under some approximation of full load and then check. So far I've only seen this happen with graphics cards, but who knows what's next (NVMe drives could be one candidate to drop their bandwidth to save power and thus reduce heat).

Written on 18 December 2019.
« Browsers and the relative size of their default monospace fonts
Linux kernel Security Modules (LSMs) need their own errno value »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Dec 18 00:38:31 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.