Wandering Thoughts archives

2025-01-20

The (potential) complexity of good runqueue latency measurement in Linux

Run queue latency is the time between when a Linux task becomes ready to run and when it actually runs. If you want good responsiveness, you want a low runqueue latency, so for a while I've been tracking a histogram of it with eBPF, and I put some graphs of it up on some Grafana dashboards I look at. Then recently I improved the responsiveness of my desktop with the cgroup V2 'cpu.idle' setting, and questions came up about how this different from process niceness. When I was looking at those questions, I realized that my run queue latency measurements were incomplete.

When I first set up my run queue latency tracking, I wasn't using either cgroup V2 cpu.idle or process niceness, and so I set up a single global runqueue latency histogram for all tasks regardless of their priority and scheduling class. Once I started using 'idle' CPU scheduling (and testing the effectiveness of niceness), this resulted in hopelessly muddled data that was effectively meaningless during the time that multiple scheduling types of scheduling or multiple nicenesses were running. Running CPU-consuming processes only when the system is otherwise idle is (hopefully) good for the runqueue latency of my regular desktop processes, but more terrible than usual for those 'run only when idle' processes, and generally there's going to be a lot more of them than my desktop processes.

The moment you introduce more than one 'class' of processes for scheduling, you need to split run queue latency measurements up between these classes if you want to really make sense of the results. What these classes are will depend on your environment. I could probably get away with a class for 'cpu.idle' tasks, a class for heavily nice'd tasks, a class for regular tasks, and perhaps a class for (system) processes running with very high priority. If you're doing fair share scheduling between logins, you might need a class per login (or you could ignore run queue latency as too noisy a measure).

I'm not sure I'd actually track all of my classes as Prometheus metrics. For my personal purposes, I don't care very much about the run queue latency of 'idle' or heavily nice'd processes, so perhaps I should update my personal metrics gathering to just ignore those. Alternately, I could write a bpftrace script that gathered the detailed class by class data, run it by hand when I was curious, and ignore the issue otherwise (continuing with my 'global' run queue latency histogram, which is at least honest in general).

linux/RunqLatencyComplexity written at 23:16:48;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.