Two views of CPU utilization (a realization)

October 14, 2022

The traditional way to present CPU utilization in metrics dashboards and the like is as a percentage from 0 to 100. This is so common and ordinary that I wrote an entry on generating this from Prometheus CPU metrics without ever questioning things, and the Linux version of top is sometimes mocked for showing process CPU utilizations of over 100% because it considers '100%' to be 'all of one CPU' on multi-CPU machines (which is to say pretty much all of them these days). But recently it struck me that this view of CPU utilization is only one of at least two ways to look at it.

The customary 0% to 100% measure is really a measure of how much of the machine you're using and how much you have left. If you're at 75% CPU utilization, you're using three quarters the machine and have a quarter of it left (more or less). This is a perfectly fine measure and often what you care about, but it's not the only measure. Another measure is what the Linux 'top' command tells you, which is how much CPU you're using, or to put it another way, how many CPUs you're using. How much CPU you're using is generally going to be a better view into how much work is being done by various things, without having to mentally re-scale a 0% to 100% number to account for things like how 10% of a 4-CPU machine is a lot less work being done than 10% of a 112-CPU machine.

Of course 'how many CPUs are we using here' isn't a perfect measure either, unless your CPUs are uniform (ours are far from it, so 100% of a CPU on machine A may be much less actual performance than 100% of a CPU on machine B). But it's a starting point, just as the customary 0-100% of the machine is a customary starting point for how loaded down the machine is. Which starting point you want depends on what questions you're interested in asking or seeing answers for.

As a pragmatic matter, people are often more worried about their machines falling over from being overloaded than they are curious about how much computation they're doing (and we're certainly no exception). This makes the 0-100% CPU utilization measure a good one to look at on a dashboard or the like, and indeed even Linux 'top' displays overall system utilization this way (even as it displays per-process 'utilization' as how much CPU it's using). But now that I've thought of it, I'm going to keep my mind open about the 'how much CPU are we using' view too, and think about if I want to look at that at some point (and how best to visualize it).

Written on 14 October 2022.
« We're moving away from swap partitions on our Linux servers
How much swap space we're using across our servers (in October 2022) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Oct 14 23:21:51 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.