The many load averages of Unix(es)

February 17, 2016

It turns out that the meaning of 'load average' on Unixes is rather more divergent than I thought it was. So here's the story as I know it.

In the beginning, by which I mean 3 BSD, the load average counted how many processes were runnable or in short term IO wait (in a decaying average). The BSD kernel computed this count periodically by walking over the process table; you can see this in for example 4.2BSD's vmtotal() function. Unixes that were derived from 4 BSD carried this definition of load average forward, which primarily meant SunOS and Ultrix. Sysadmins using NFS back in those days got very familiar with the 'short term IO wait' part of load average, because if your NFS server stopped responding, all of your NFS clients would accumulate lots of processes in IO waits (which were no longer so short term) and their load averages would go skyrocketing to absurd levels.

(Technically the definition was not 'IO wait', it was 'any process that was sleeping with a non-interruptible priority'. In theory this was only processes in IO wait. Yes, this included processes waiting on NFS IO on NFS mounts marked intr; it's complicated.)

When Linux implemented the load average (which it did very early, as 0.96c has it), it copied this traditional definition. Linux load average has been 'run queue plus (short term) IO wait' ever since, although the exact mechanics of how it was computed have changed over time to be more efficient.

(Once multiprocessor systems and large numbers of processes showed up, people soon worked out that 'iterate over the entire process table' was not necessarily a good idea.)

When Sun executed the great SunOS 4 to Solaris transition, I'm not quite sure what happened to their definition of the load average. At least some sources claim that it was immediately redefined to drop IO waits (which would mean that a NFS client would maintain a low load average even when the NFS server went away). Exactly how Solaris counted up 'runnable processes' apparently changed somewhat in Solaris 10; in theory I think this is not supposed to affect the results materially. By Solaris 10 it seems definite that Solaris does not count processes in IO wait in the load average, and this has been carried forward into Illumos and derivatives.

(I looked at the Illumos source code very briefly and determined that it was complicated enough that it was too much work to understand it for this entry.)

The situation with the *BSDs is messy. I haven't thoroughly investigated historical source trees, but I can't imagine that 386BSD and then NetBSD people immediately changed the 4BSD definition of the load average to drop processes in IO wait. Certainly the FreeBSD 2.0 sources I have handy access to (via this Github repo) still count processes in IO wait. Then at some point things get very tangled and some of the available information I could find seems to be wrong (eg). The net result is that FreeBSD split apart from OpenBSD and NetBSD in load average calculations, and OpenBSD and NetBSD are somewhat divergent from each other.

As far as I can decode the current state of load average calculations on the three are:

  • In FreeBSD, load average counts only runnable processes, not processes in IO wait. The count of runnable processes is maintained on the fly by the scheduler in code that I'm not going to try to link to.

  • In NetBSD, kern/kern_synch.c's sched_pstats() function counts both runnable processes and all sleeping processes that have slept for less than one second so far (at least that's what I think l_slptime is counting).

  • In OpenBSD, uvm/uvm_meter.c's uvm_loadav() function counts both runnable processes and sleeping processes that are in high priority IO wait and have slept for less than one second so far (assuming I understand p_slptime correctly). This is fewer sleeping processes than NetBSD seems to include.

(Don't ask me what Dragonfly BSD does here.)

This is all very messy and contradicts some things knowledgeable OpenBSD people have said. Mind you, they said them in 2009, but on the other hand I can't imagine that OpenBSD would have dropped and then restored counting processes in IO wait (and I can't find any sign of that in their CVS logs).

(I don't know what any other commercial Unixes do here, including Mac OS X. Energetic people are encouraged to do their own research.)

The real moral is that the exact definition of 'load average' is a mess today. If you think you care about load average, you should find out how much IO waiting and general sleeping it includes on your system, ideally via actual experimentation.

Written on 17 February 2016.
« Whether or not to use cgo for Go packages, illustrated in a dilemma
Two models of dealing with cookies in Firefox with addons »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 17 02:40:15 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.