Wandering Thoughts archives

2021-01-20

A realization about the Linux CPU pressure stall information

Modern versions of the Linux kernel have a set of metrics that are intended to give you a better high-level indicator of where (or why) your system is 'loaded' than the venerable load average, in the form of Pressure Stall Information (also). The top level version of this is exposed as three files in /proc/pressure, called cpu, memory, and io. If your distribution uses cgroup2, each cgroup also has its own version of these that's specific to the cgroup.

(Recent versions of systemd will use this information as part of systemd-oomd and probably other things.)

As covered in the documentation, the io and memory PSI files have two lines, one line for 'some' and one line for 'full', while cpu has only 'some'. The 'some' metric is for when some (one or more) tasks was delayed for lack of the resource, while the 'full' metric is for when all tasks were delayed.

When I first read about all of this, I didn't immediately see why the cpu pressure information only had the 'some' metric; I had to think about it. The answer is that unlike memory and IO, where tasks can be entirely stalled with none of them getting any of the resource yet, there's always some task getting CPU if there's demand for it. A 'full' stall on CPU would require that you have runnable tasks but that nothing was actually being scheduled.

Looking at the total system (for /proc/pressure), it's basically impossible for CPU to stall this way without a serious kernel problem; basically the kernel scheduler would have to be stuck somehow. However, I'm not sure that this is impossible for an individual cgroup; since you can arrange a hierarchy of per-cgroup priorities for CPU time, it wouldn't surprise me if you could completely starve a victim cgroup. Right now the 'cpu.pressure' file for cgroups only has the 'some' metric, just like the /proc/pressure version, but perhaps that will change in the future.

(Linux also has low-priority 'idle' scheduling, as covered in sched(7), so you might be able to manipulate all of the tasks in a cgroup into that so they get starved that way.)

linux/PSICpuWhyNoFull written at 00:38:57; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.