Reading the Linux cpufreq sysfs interface is (deliberately) slow

March 21, 2024

The Linux kernel has a CPU frequency (management) system, called cpufreq. As part of this, Linux (on supported hardware) exposes various CPU frequency information under /sys/devices/system/cpu, as covered in Policy Interface in sysfs. Reading these files can provide you with some information about the state of your system's CPUs, especially their current frequency (more or less). This information is considered interesting enough that the Prometheus host agent collects (some) cpufreq information by default. However, there is a little caution, which is that apparently the kernel deliberately slows down reading this information from /sys (as I learned recently. A comment in the relevant Prometheus code says that this delay is 50 milliseconds, but this comment dates from 2019 and may be out of date now (I wasn't able to spot the slowdown in the kernel code itself).

On a machine with only a few CPUs, reading this information is probably not going to slow things down enough that you really notice. On a machine with a lot of CPUs, the story can be very different. We have one AMD 512-CPU machine, and on this machine reading every CPU's scaling_cur_freq one at a time takes over ten seconds:

; cd /sys/devices/system/cpu/cpufreq
; time cat policy*/scaling_cur_freq >/dev/null
10.25 real 0.07 user 0.00 kernel

On a 112-CPU Xeon Gold server, things are not so bad at 2.24 seconds; a 128-Core AMD takes 2.56 seconds. A 64-CPU server is down to 1.28 seconds, a 32-CPU one 0.64 seconds, and on my 16-CPU and 12-CPU desktops (running Fedora instead of Ubuntu) the time is reported as '0.00 real'.

This potentially matters on high-CPU machines where you're running any sort of routine monitoring that tries to read this information, including the Prometheus host agent in its default configuration. The Prometheus host agent reduces the impact of this slowdown somewhat, but it's still noticeably slower to collect all of the system information if we have the 'cpufreq' collector enabled on these machines. As a result of discovering this, I've now disabled the Prometheus host agent's 'cpufreq' collector on anything with 64 cores or more, and we may reduce that in the future. We don't have a burning need to see CPU frequency information and we would like to avoid slow data collection and occasional apparent impacts on the rest of the system.

(Typical Prometheus configurations magnify the effect of the slowdown because it's common to query ('scrape') the host agent quite often, for example every fifteen seconds. Every time you do this, the host agent re-reads these cpufreq sysfs files and hits this delay.)

PS: I currently have no views on how useful the system's CPU frequencies are as a metric, and how much they might be perturbed by querying them (although the Prometheus host agent deliberately pretends it's running on a single-CPU machine, partly to avoid problems in this area). If you do, you might either universally not collect CPU frequency information or take the time impact to do so even on high-CPU machines.

Written on 21 March 2024.
« When I reimplement one of my programs, I often wind up polishing it too
The Linux kernel.task_delayacct sysctl and why you might care about it »

Page tools: View Source.
Search:
Login: Password:

Last modified: Thu Mar 21 23:09:03 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.