Reading the Linux cpufreq sysfs interface is (deliberately) slow
The Linux kernel has a CPU frequency (management) system, called
cpufreq.
As part of this, Linux (on supported hardware) exposes various CPU
frequency information under /sys/devices/system/cpu, as covered in
Policy Interface in sysfs.
Reading these files can provide you with some information about the
state of your system's CPUs, especially their current frequency
(more or less). This information is considered interesting enough
that the Prometheus host agent collects (some) cpufreq
information by default. However, there is a little caution, which
is that apparently the kernel deliberately slows down reading
this information from /sys
(as I learned recently. A comment in
the relevant Prometheus code
says that this delay is 50 milliseconds, but this comment dates
from 2019
and may be out of date now (I wasn't able to spot the slowdown in
the kernel code itself).
On a machine with only a few CPUs, reading this information is probably not going to slow things down enough that you really notice. On a machine with a lot of CPUs, the story can be very different. We have one AMD 512-CPU machine, and on this machine reading every CPU's scaling_cur_freq one at a time takes over ten seconds:
; cd /sys/devices/system/cpu/cpufreq ; time cat policy*/scaling_cur_freq >/dev/null 10.25 real 0.07 user 0.00 kernel
On a 112-CPU Xeon Gold server, things are not so bad at 2.24 seconds; a 128-Core AMD takes 2.56 seconds. A 64-CPU server is down to 1.28 seconds, a 32-CPU one 0.64 seconds, and on my 16-CPU and 12-CPU desktops (running Fedora instead of Ubuntu) the time is reported as '0.00 real'.
This potentially matters on high-CPU machines where you're running any sort of routine monitoring that tries to read this information, including the Prometheus host agent in its default configuration. The Prometheus host agent reduces the impact of this slowdown somewhat, but it's still noticeably slower to collect all of the system information if we have the 'cpufreq' collector enabled on these machines. As a result of discovering this, I've now disabled the Prometheus host agent's 'cpufreq' collector on anything with 64 cores or more, and we may reduce that in the future. We don't have a burning need to see CPU frequency information and we would like to avoid slow data collection and occasional apparent impacts on the rest of the system.
(Typical Prometheus configurations magnify the effect of the slowdown because it's common to query ('scrape') the host agent quite often, for example every fifteen seconds. Every time you do this, the host agent re-reads these cpufreq sysfs files and hits this delay.)
PS: I currently have no views on how useful the system's CPU frequencies are as a metric, and how much they might be perturbed by querying them (although the Prometheus host agent deliberately pretends it's running on a single-CPU machine, partly to avoid problems in this area). If you do, you might either universally not collect CPU frequency information or take the time impact to do so even on high-CPU machines.
|
|