A realization about the Linux CPU pressure stall information
Modern versions of the Linux kernel have a set of metrics that are
intended to give you a better high-level indicator of where (or
why) your system is 'loaded' than the venerable load average, in
the form of Pressure Stall Information (also). The top
level version of this is exposed as three files in /proc/pressure
,
called cpu
, memory
, and io
. If your distribution uses cgroup2,
each cgroup also has its own version of these that's specific to the
cgroup.
(Recent versions of systemd will use this information as part of systemd-oomd and probably other things.)
As covered in the documentation, the io
and memory
PSI files
have two lines, one line for 'some' and one line for 'full', while
cpu
has only 'some'. The 'some' metric is for when some (one or
more) tasks was delayed for lack of the resource, while the 'full'
metric is for when all tasks were delayed.
When I first read about all of this, I didn't immediately see why
the cpu
pressure information only had the 'some' metric; I had
to think about it. The answer is that unlike memory and IO, where
tasks can be entirely stalled with none of them getting any of the
resource yet, there's always some task getting CPU if there's
demand for it. A 'full' stall on CPU would require that you have
runnable tasks but that nothing was actually being scheduled.
Looking at the total system (for /proc/pressure
), it's basically
impossible for CPU to stall this way without a serious kernel
problem; basically the kernel scheduler would have to be stuck
somehow. However, I'm not sure that this is impossible for an
individual cgroup; since you can arrange a hierarchy of per-cgroup
priorities for CPU time, it wouldn't surprise me if you could
completely starve a victim cgroup. Right now the 'cpu.pressure
'
file for cgroups only has the 'some' metric, just like the
/proc/pressure
version, but perhaps that will change in the future.
(Linux also has low-priority 'idle' scheduling, as covered in sched(7), so you might be able to manipulate all of the tasks in a cgroup into that so they get starved that way.)
|
|