2013-09-19
Processes waiting for NFS IO do show in Linux %iowait statistics
Suppose that you have a machine that does decent amounts of both
local disk IO and NFS IO and it's not performing as well as you'd
like. Tools like vmstat show that it's spending a significant
amount of time in %iowait while your (local)
disk stats tool is somewhat ambivalent but suggests
that the local disks are often not saturated. Can you safely conclude
that your system is spending a bunch of its time waiting on NFS IO
and this is what the %iowait numbers are reflecting?
As far as I can tell from both experimentation and some reading of
the kernel source, the answer is yes. Waiting for NFS IO shows up
in %iowait. A NFS client with otherwise inexplicable %iowait
times is thus waiting on NFS IO because your fileservers aren't
responding as fast as it would like.
(Crudely simplifying, from the kernel source it very much looks
like the same mechanisms drive %iowait as drive things like
vmstat's b column ('busy' processes, processes waiting in
uninterruptible sleep) and in fact the Linux load average itself,
and processes waiting on NFS IO definitely show up in the latter
two.)
You might wonder why I bothered asking such an obvious question.
The simple answer is that Unix systems have had a historical habit
of not considering remote filesystem IO to be 'real' disk IO. In
the old days it would have been perfectly in character for %iowait
to only reflect IO to real disks. Current manpages for vmstat,
top, and so on do describe %iowait generally as eg 'time spent
waiting for IO' (without restricting it to disks) but my old habits die hard.
Sidebar: NFS client performance information
It turns out that while I wasn't looking Linux has gained quite
detailed NFS client performance statistics in the form of a whole
barrage of (per-filesystem) stuff that's reported through
/proc/self/mountstats. Unfortunately both documentation on what's
in mountstats and good tools for monitoring it seem to be a bit
lacking, but a lot of information is there if you can dig it out.
(See nfsiostats
from the sysstat package for one
thing that reads it. Note that this can be compiled on, say, Ubuntu
10.04 even though 10.04 didn't package it.)
2013-09-09
Why the RPM source and binary package format is superior to Debian .debs
In an earlier entry I mentioned that I felt that RPM was a better package format in practice than Debian .debs. Today it's time to take a stab at justify this. But to start with, let's talk about what this entry is not about.
In the old days Debian was superior in practice because the whole apt
software suite and ecology had no real equivalent in RPM land. Those
days are long gone. These days yum is as much as part of the RPM
ecology as apt is for the Debian package one, and in fact the rpm
command has been demoted to the same general footnote status that dpkg
is in Debian. Having used both I feel that yum and apt are sufficiently
comparable that I'm not going to quibble back and forth. So this entry
is purely about the package format itself, not the surrounding tools.
There are two sides to this, source packages and binary packages. In the source package world, RPM is in the lead because of the way that it systematically organize changes to the source and does builds outside of the package's source area. I wrote about this at length here and here on the hassles of working with Debian sources, here on the different tradeoffs of Debian and RPM source packages, and here on why the Debian tradeoff is the wrong one.
For binary packages I'm going to set aside Debian's misstep with x86 multi-arch support, partly because I understand it's increasingly being fixed. In practice there are two important differences between the package formats: RPMs can and generally do depend on files while .debs depend on packages and Debian packages can ask you questions at install time while RPMs never do. The dependency issue is the lesser one. Depending on packages is brittle in the face of other people reorganizing their packages and also requires a somewhat indirect process for determining a package's dependencies when the package is built. Depending on files means that RPMs are agnostic about just what provides those files and the RPM build process can easily determine most dependencies by direct examination of packaged programs and so on.
In the abstract asking questions at install time sounds great or at least harmless. In practice it's a disaster that enables a catalog of bad practices, ranging from people papering over internal divisions through ambushing sysadmins with landmines to drastically complicating attempts to do automated and semi-automated installations and package updates. By contrast RPMs have a very hard rule, enforced by the system, that a package cannot interact with the user at install time at all. Packagers are forced to make decisions and insure that the package setup is sane, while sysadmins can script and otherwise automate RPM installs without worry.
Theoretically you can turn off all install time questions on Debian. In practice this can't be trusted because it's not the default state of affairs. Oh sure, it's probably considered a bug if a Debian package screws things up on a no-questions install, but do you want to be the first person to find that bug? If you are a sane sysadmin, the answer is 'no' (not unless you have no choice). In practice what is fully supported is what is routine and in the Debian world that's asking you questions during package installs.
(It's possible that graphical package management is changing this, but I'm not holding my breath.)
2013-09-02
The current weak areas of ZFS on Linux
I've been looking into ZFS on Linux for a while as a potential alternative to Illumos for our next generation of fileservers (FreeBSD is unfortunately disqualified). As part of that I have been working to understand ZoL's current weak areas so that I can better understand where it might cause us problems.
The following is the best current information I have; it comes from reading the ZoL mailing list (and at one point asking the ZoL mailing list this exact question).
The weak areas that I know about:
- Using ZFS as your root filesystem requires wrestling with GRUB,
Grub scripts, initramfs-building scripts, and support in installers
(if you want to install the system as ZoL-root from the start).
How well this works depends on your distribution; some have good
support (eg Gentoo), others
have third party repositories with prebuilt packages, and still
others leave you on your own.
- There are periodic problem reports about getting ZFS filesystems reliably mounted on boot.
- In some environments ZoL can have problems reliably finding the
disk devices for your pools on boot. This is especially likely
if you use
/dev/sd*device names but apparently sometimes happens to people who use more stable identifiers.(Apparently part of the likely solution is to hook ZoL into udev so that as disks are discovered ZoL checks to see if a pool now has a full set of devices and can be brought up.)
- ZoL lacks a number of standard Linux filesystem features, including
support for
O_DIRECT, asynchronous IO, and POSIX ACLs. It also lacks support for issuing TRIM commands to drives (this is apparently only present in the FreeBSD version of ZFS so far). - There is no 'event daemon' to handle events like disks going away.
The most significant result of this is that ZFS pool spares do not
get activated on disk failure (making them basically pointless).
- ZFS's use of kernel memory is not well integrated with the Linux
kernel memory system, resulting in runaway memory usage in some
situations. Apparently metadata intensive workloads (such as
rsyncruns) are especially prone to this.
The last issue deserves more discussion. All of this is what I've gathered from the mailing list and from looking at the ZFS on Linux source code.
To start with, ZFS on Linux is not really ZFS ported to Linux; instead it's mostly the Illumos ZFS code dropped on top of a layer of code to translate and emulate the Solaris kernel APIs that ZFS needs (the SPL, short for 'Solaris Porting Layer'). This includes a great deal of kernel memory handling. The unfortunate result of this is a series of mismatches between what ZFS thinks is going on with kernel memory and what is actually going on, due to the translation and emulation that is required. Through fragmentation that's invisible to ZFS and other issues, ZFS can wind up using a lot more memory for things like the ARC than it is supposed to (because ZFS thinks it's using a lot less memory than it actually is).
(I suspect that ZFS itself still has some degree of the ZFS level fragmentation problems we've seen but that's much less dangerous because it just leaves the ARC smaller than it should be. The ZoL problem is that the ARC and related things can eat all of your RAM and make your kernel explode.)
Whether this happens to you (and how much it affects you) is unpredictable because it depends very much on the details of how your system uses memory. As mentioned, people seem to have problems with metadata heavy workloads but not everyone reporting problems on the ZoL mailing lists is in this situation.
PS: if you are coming here from Internet searches, please pay attention to the date of this entry. I certainly hope that all of these issues will get dealt with over time.