The Prometheus host agent can disturb Linux CPU frequency measurements
Recently I read CPU frequency scaling metrics from the node exporter, which talks about how to look at the Prometheus metrics that the Prometheus host agent gathers and exposes to Prometheus. Naturally this got me to to look at the frequencies that my own little Prometheus setup on my home machine had gathered, which gave me a surprise.
Like a lot of desktops, my home machine is idle almost all of the
time, and I can see that reflected in a lot of the statistics that
the Prometheus host agent gathers. But Prometheus reported that my
my CPU frequency was hovering up at very high values, often around
4 GHz (and checking confirmed that these were what the host agent
was reporting). Since this didn't match my expectations, I looked
at the direct information in
: hawklords.cs ; cd /sys/devices/system/cpu/cpufreq : hawklords.cs ; cat policy?/scaling_cur_freq policy??/scaling_cur_freq 800079 800210 800106 800045 800091 800032 800162 800644 801214 800175 800060 800026
That is, my CPUs are sitting at around 800 Mhz, which is actually the minimum frequency (scaling_min_freq is 800000). That's what I see almost all of the time when my desktop is idle, with brief exceptions.
My only theory for what's going on with the Prometheus host agent is that this is happening because the host agent is a Go program and is quite parallelized and concurrent. When Prometheus or you ask the host agent for metrics, it immediately goes out to gather them from all of its collectors in parallel, which is likely to make many or all of your CPUs busy and thus push up their frequencies. Apparently my overall system (Linux, the CPU, and whatever BIOS magic is going on) is so good at this that the speed rises fast enough for the host agent to observe it, and then drops again almost immediately once the host agent is done. I suspect that the Prometheus daemon itself also contributes to the CPU usage (since it's receiving the data from the host agent), but I expect that the host agent's multi-CPU usage is the big factor.
(The choice of CPU frequency governor likely affects this; my home machine is currently on 'powersave', which is what my Fedora 31 environment defaults to. The CPU frequency driver is intel_pstate.)
This unfortunately rather reduces the usefulness of the host agent's CPU frequency information on Linux. You can probably use it to look at big exceptions (such as CPUs, cores, or sockets that are persistently out of step with what they should be), but it's clearly not a reliable guide to the normal state of your systems.
PS: I see similar but less drastic effects on my office machine, which has an AMD Ryzen instead of an Intel CPU.
Direct examination in
/sys suggests that it idles around 1.8 Ghz,
but the host agent sees it around 2.7 to 2.9 Ghz when idle, with
spikes to higher.
PPS: The host agent does sometimes observe low frequencies; it's reported 800 Mhz frequencies on each core on my home machine at some point over the past week. It even appears to have seen 800 Mhz on all cores at some point.