The Prometheus host agent can disturb Linux CPU frequency measurements

August 18, 2020

Recently I read CPU frequency scaling metrics from the node exporter, which talks about how to look at the Prometheus metrics that the Prometheus host agent gathers and exposes to Prometheus. Naturally this got me to to look at the frequencies that my own little Prometheus setup on my home machine had gathered, which gave me a surprise.

Like a lot of desktops, my home machine is idle almost all of the time, and I can see that reflected in a lot of the statistics that the Prometheus host agent gathers. But Prometheus reported that my my CPU frequency was hovering up at very high values, often around 4 GHz (and checking confirmed that these were what the host agent was reporting). Since this didn't match my expectations, I looked at the direct information in /sys:

: hawklords.cs ; cd /sys/devices/system/cpu/cpufreq
: hawklords.cs ; cat policy?/scaling_cur_freq policy??/scaling_cur_freq
800079
800210
800106
800045
800091
800032
800162
800644
801214
800175
800060
800026

That is, my CPUs are sitting at around 800 Mhz, which is actually the minimum frequency (scaling_min_freq is 800000). That's what I see almost all of the time when my desktop is idle, with brief exceptions.

My only theory for what's going on with the Prometheus host agent is that this is happening because the host agent is a Go program and is quite parallelized and concurrent. When Prometheus or you ask the host agent for metrics, it immediately goes out to gather them from all of its collectors in parallel, which is likely to make many or all of your CPUs busy and thus push up their frequencies. Apparently my overall system (Linux, the CPU, and whatever BIOS magic is going on) is so good at this that the speed rises fast enough for the host agent to observe it, and then drops again almost immediately once the host agent is done. I suspect that the Prometheus daemon itself also contributes to the CPU usage (since it's receiving the data from the host agent), but I expect that the host agent's multi-CPU usage is the big factor.

(The choice of CPU frequency governor likely affects this; my home machine is currently on 'powersave', which is what my Fedora 31 environment defaults to. The CPU frequency driver is intel_pstate.)

This unfortunately rather reduces the usefulness of the host agent's CPU frequency information on Linux. You can probably use it to look at big exceptions (such as CPUs, cores, or sockets that are persistently out of step with what they should be), but it's clearly not a reliable guide to the normal state of your systems.

PS: I see similar but less drastic effects on my office machine, which has an AMD Ryzen instead of an Intel CPU. Direct examination in /sys suggests that it idles around 1.8 Ghz, but the host agent sees it around 2.7 to 2.9 Ghz when idle, with spikes to higher.

PPS: The host agent does sometimes observe low frequencies; it's reported 800 Mhz frequencies on each core on my home machine at some point over the past week. It even appears to have seen 800 Mhz on all cores at some point.

Written on 18 August 2020.
« Firefox and web browsers for Linux
Potential problem points for Chrome (or any browser) to support Linux »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Aug 18 23:26:44 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.