Wandering Thoughts archives

2013-09-19

Processes waiting for NFS IO do show in Linux %iowait statistics

Suppose that you have a machine that does decent amounts of both local disk IO and NFS IO and it's not performing as well as you'd like. Tools like vmstat show that it's spending a significant amount of time in %iowait while your (local) disk stats tool is somewhat ambivalent but suggests that the local disks are often not saturated. Can you safely conclude that your system is spending a bunch of its time waiting on NFS IO and this is what the %iowait numbers are reflecting?

As far as I can tell from both experimentation and some reading of the kernel source, the answer is yes. Waiting for NFS IO shows up in %iowait. A NFS client with otherwise inexplicable %iowait times is thus waiting on NFS IO because your fileservers aren't responding as fast as it would like.

(Crudely simplifying, from the kernel source it very much looks like the same mechanisms drive %iowait as drive things like vmstat's b column ('busy' processes, processes waiting in uninterruptible sleep) and in fact the Linux load average itself, and processes waiting on NFS IO definitely show up in the latter two.)

You might wonder why I bothered asking such an obvious question. The simple answer is that Unix systems have had a historical habit of not considering remote filesystem IO to be 'real' disk IO. In the old days it would have been perfectly in character for %iowait to only reflect IO to real disks. Current manpages for vmstat, top, and so on do describe %iowait generally as eg 'time spent waiting for IO' (without restricting it to disks) but my old habits die hard.

Sidebar: NFS client performance information

It turns out that while I wasn't looking Linux has gained quite detailed NFS client performance statistics in the form of a whole barrage of (per-filesystem) stuff that's reported through /proc/self/mountstats. Unfortunately both documentation on what's in mountstats and good tools for monitoring it seem to be a bit lacking, but a lot of information is there if you can dig it out.

(See nfsiostats from the sysstat package for one thing that reads it. Note that this can be compiled on, say, Ubuntu 10.04 even though 10.04 didn't package it.)

linux/NFSIOShowsInIowait written at 23:50:08; Add Comment

Load is a whole system phenomenon

Here's something obvious: load and its companion overload is something that's created by everything that's going on on your system at once. Oh, sure, some subset of the activity can be saturating a particular resource, but in general (and without quota-based things like Linux's cgroups) it is the sum of all activity (or all relevant activity) that matters.

So far this probably all sounds very obvious, and it is. But there's a big corollary: if you want to limit load, you must take a global perspective on activity. If you have ten things that each could create load, you can't limit overall system load just by limiting those ten things individually and in isolation from each other. A 'reasonable load' for one thing by itself is not necessarily reasonable when all ten are loaded at once. If you have no dynamic global system the best you can do is to assign static quotas such that each thing gets a limit of (say) 1/10th of the machine and can't use more even when the system is otherwise idle.

Now this comes with an exception: if all activity funnels through one central point at some point in processing, you can (sometimes) put load limits on that single point and be done. That's because the single point implicitly has a global view of the load; it 'knows' what the global total load is because it sees all traffic.

All of this sounds hopelessly abstract, so let's talk web servers and web applications. Suppose you have a web server serving ten web apps, each of which is handled by its own separate daemon. You want your machine to not explode under load, no matter what load it is. Can you get this by just putting individual limits on each web app (eg 'only so much concurrency at once')? My answer is 'not unless you're going to use low limits', at least if demand for the apps is unpredictable. To do this properly you need some central point to apply a whole system view and whole system limits. One such spot might be the front-end web server; another might be a daemon that handles or at least monitors all web apps at once.

In short, now you know why I feel that separate standalone daemons are the wrong approach for scalable app deployment. Separate daemons mean separate limits and you can't configure those sensibly without risking blowing up your machine under load. The more apps you have the worse this gets (because the less their 'safe' share of the machine is).

tech/LoadWholeSystem written at 00:40:04; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.