2013-09-19
Processes waiting for NFS IO do show in Linux %iowait
statistics
Suppose that you have a machine that does decent amounts of both
local disk IO and NFS IO and it's not performing as well as you'd
like. Tools like vmstat
show that it's spending a significant
amount of time in %iowait
while your (local)
disk stats tool is somewhat ambivalent but suggests
that the local disks are often not saturated. Can you safely conclude
that your system is spending a bunch of its time waiting on NFS IO
and this is what the %iowait
numbers are reflecting?
As far as I can tell from both experimentation and some reading of
the kernel source, the answer is yes. Waiting for NFS IO shows up
in %iowait
. A NFS client with otherwise inexplicable %iowait
times is thus waiting on NFS IO because your fileservers aren't
responding as fast as it would like.
(Crudely simplifying, from the kernel source it very much looks
like the same mechanisms drive %iowait
as drive things like
vmstat's b
column ('busy' processes, processes waiting in
uninterruptible sleep) and in fact the Linux load average itself,
and processes waiting on NFS IO definitely show up in the latter
two.)
You might wonder why I bothered asking such an obvious question.
The simple answer is that Unix systems have had a historical habit
of not considering remote filesystem IO to be 'real' disk IO. In
the old days it would have been perfectly in character for %iowait
to only reflect IO to real disks. Current manpages for vmstat
,
top
, and so on do describe %iowait
generally as eg 'time spent
waiting for IO' (without restricting it to disks) but my old habits die hard.
Sidebar: NFS client performance information
It turns out that while I wasn't looking Linux has gained quite
detailed NFS client performance statistics in the form of a whole
barrage of (per-filesystem) stuff that's reported through
/proc/self/mountstats
. Unfortunately both documentation on what's
in mountstats
and good tools for monitoring it seem to be a bit
lacking, but a lot of information is there if you can dig it out.
(See nfsiostats
from the sysstat package for one
thing that reads it. Note that this can be compiled on, say, Ubuntu
10.04 even though 10.04 didn't package it.)
Load is a whole system phenomenon
Here's something obvious: load and its companion overload is something that's created by everything that's going on on your system at once. Oh, sure, some subset of the activity can be saturating a particular resource, but in general (and without quota-based things like Linux's cgroups) it is the sum of all activity (or all relevant activity) that matters.
So far this probably all sounds very obvious, and it is. But there's a big corollary: if you want to limit load, you must take a global perspective on activity. If you have ten things that each could create load, you can't limit overall system load just by limiting those ten things individually and in isolation from each other. A 'reasonable load' for one thing by itself is not necessarily reasonable when all ten are loaded at once. If you have no dynamic global system the best you can do is to assign static quotas such that each thing gets a limit of (say) 1/10th of the machine and can't use more even when the system is otherwise idle.
Now this comes with an exception: if all activity funnels through one central point at some point in processing, you can (sometimes) put load limits on that single point and be done. That's because the single point implicitly has a global view of the load; it 'knows' what the global total load is because it sees all traffic.
All of this sounds hopelessly abstract, so let's talk web servers and web applications. Suppose you have a web server serving ten web apps, each of which is handled by its own separate daemon. You want your machine to not explode under load, no matter what load it is. Can you get this by just putting individual limits on each web app (eg 'only so much concurrency at once')? My answer is 'not unless you're going to use low limits', at least if demand for the apps is unpredictable. To do this properly you need some central point to apply a whole system view and whole system limits. One such spot might be the front-end web server; another might be a daemon that handles or at least monitors all web apps at once.
In short, now you know why I feel that separate standalone daemons are the wrong approach for scalable app deployment. Separate daemons mean separate limits and you can't configure those sensibly without risking blowing up your machine under load. The more apps you have the worse this gets (because the less their 'safe' share of the machine is).