Linux's %iowait
statistic
The iostat
manpage documents %iowait
as:
Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
It turns out that the manpage is wrong, which I found out by reading the kernel source because I was curious about what exactly it measured.
The actual definition of %iowait
is the percentage of the time that
the system was idle and at least one process was waiting for disk IO to
finish. (This is true for both the 2.6 kernel and Red Hat's special
2.4 kernels with better disk IO statistics, including Red Hat
Enterprise 3.)
(The actual kernel measure is the amount of time that each CPU has spent
in each mode; it shows up in /proc/stat
. iostat
converts this to
percentages.)
The difference may seem picky, but it's important because not all IO
causes processes to wait. For example, Linux doesn't immediately flush
data written to files to disk; it does it later, in the background,
when it's convenient. Under the manpage's definition, this background
flushing of data would take a system from %idle
into %iowait
, as
would slowly paging out unused bits of programs.
This means %iowait
is roughly the amount of time that your system
could have been doing useful work if the disks were faster. A climbing
%iowait
is a danger sign that your system may be running into an IO
bottleneck. A low iowait is not necessarily an indication that you
don't have an IO problem; you also want to look at things like the
number of processes shown as blocked ('b
' state) in vmstat
output.
(Finding potential disk IO bottlenecks and troubleshooting them is a really big field, so this is in no way comprehensive advice.)
|
|