The Linux load average does mean something (although maybe not much)
One of the things you'll hear about monitoring your systems is that the load average is not really a metric that you should pay attention to, and so perhaps things like an IMAP server with an elevated load average or a login server with periodic load spikes are not worth caring about. There is something to be said for this, and I've come to think that load average is a secondary indicator, but I also think that the Linux load average can still tell you things that matter.
The first thing the Linux load average may tell you is how many tasks are waiting for IO. The amount of time there's something waiting is tracked in kernel pressure indicators, but the pressure indicators don't tell you how many things are waiting; load average will, and that might matter. However, this information is sampled only every five seconds for load averages, so it's a relatively coarse indicator.
The second thing the Linux load average may give you is some indication that you had a burst of transient tasks (or transiently active tasks). If you see a spike in the load average but no sign of it in other indicators, then you know that something happened and it can't have lasted very long; for a brief period, you had a lot of tasks that were either runnable or in IO wait. You're probably more likely to see something like this on a big machine with a lot of CPUs, for the simple reason that if you had fewer CPUs, tasks would have started having to wait and you'd see signs of this in other indicators (CPU utilization, CPU and IO pressure, and so on).
(As far as IO goes, remember that Linux's iowait statistic is only a lower bound on multi-CPU machines, which today is almost everything except very small virtual machines.)
Unfortunately, as I discovered, the only
way to get high resolution versions of all of the information that
goes into the load average is through special interaction with
cgroup (and possibly only cgroup v1). Reading /proc/loadavg will
give you the instantaneous number of runnable tasks, as will
/proc/stat (in 'procs_running
'), but the number of uninterruptible
tasks is not directly exposed anywhere. The 'procs_blocked
'
field of /proc/stat counts the number of tasks in IO wait instead
of the number in uninterruptible sleep, although perhaps the numbers
are often the same.
(The Linux kernel scheduler is sufficiently tangled that it's possible for the two to be basically synonymous, but there may be other commonly encountered ways to get uninterruptible but not running tasks.)
Comments on this page:
|
|