Wandering Thoughts archives

2013-09-19

Processes waiting for NFS IO do show in Linux %iowait statistics

Suppose that you have a machine that does decent amounts of both local disk IO and NFS IO and it's not performing as well as you'd like. Tools like vmstat show that it's spending a significant amount of time in %iowait while your (local) disk stats tool is somewhat ambivalent but suggests that the local disks are often not saturated. Can you safely conclude that your system is spending a bunch of its time waiting on NFS IO and this is what the %iowait numbers are reflecting?

As far as I can tell from both experimentation and some reading of the kernel source, the answer is yes. Waiting for NFS IO shows up in %iowait. A NFS client with otherwise inexplicable %iowait times is thus waiting on NFS IO because your fileservers aren't responding as fast as it would like.

(Crudely simplifying, from the kernel source it very much looks like the same mechanisms drive %iowait as drive things like vmstat's b column ('busy' processes, processes waiting in uninterruptible sleep) and in fact the Linux load average itself, and processes waiting on NFS IO definitely show up in the latter two.)

You might wonder why I bothered asking such an obvious question. The simple answer is that Unix systems have had a historical habit of not considering remote filesystem IO to be 'real' disk IO. In the old days it would have been perfectly in character for %iowait to only reflect IO to real disks. Current manpages for vmstat, top, and so on do describe %iowait generally as eg 'time spent waiting for IO' (without restricting it to disks) but my old habits die hard.

Sidebar: NFS client performance information

It turns out that while I wasn't looking Linux has gained quite detailed NFS client performance statistics in the form of a whole barrage of (per-filesystem) stuff that's reported through /proc/self/mountstats. Unfortunately both documentation on what's in mountstats and good tools for monitoring it seem to be a bit lacking, but a lot of information is there if you can dig it out.

(See nfsiostats from the sysstat package for one thing that reads it. Note that this can be compiled on, say, Ubuntu 10.04 even though 10.04 didn't package it.)

NFSIOShowsInIowait written at 23:50:08; Add Comment

2013-09-09

Why the RPM source and binary package format is superior to Debian .debs

In an earlier entry I mentioned that I felt that RPM was a better package format in practice than Debian .debs. Today it's time to take a stab at justify this. But to start with, let's talk about what this entry is not about.

In the old days Debian was superior in practice because the whole apt software suite and ecology had no real equivalent in RPM land. Those days are long gone. These days yum is as much as part of the RPM ecology as apt is for the Debian package one, and in fact the rpm command has been demoted to the same general footnote status that dpkg is in Debian. Having used both I feel that yum and apt are sufficiently comparable that I'm not going to quibble back and forth. So this entry is purely about the package format itself, not the surrounding tools.

There are two sides to this, source packages and binary packages. In the source package world, RPM is in the lead because of the way that it systematically organize changes to the source and does builds outside of the package's source area. I wrote about this at length here and here on the hassles of working with Debian sources, here on the different tradeoffs of Debian and RPM source packages, and here on why the Debian tradeoff is the wrong one.

For binary packages I'm going to set aside Debian's misstep with x86 multi-arch support, partly because I understand it's increasingly being fixed. In practice there are two important differences between the package formats: RPMs can and generally do depend on files while .debs depend on packages and Debian packages can ask you questions at install time while RPMs never do. The dependency issue is the lesser one. Depending on packages is brittle in the face of other people reorganizing their packages and also requires a somewhat indirect process for determining a package's dependencies when the package is built. Depending on files means that RPMs are agnostic about just what provides those files and the RPM build process can easily determine most dependencies by direct examination of packaged programs and so on.

In the abstract asking questions at install time sounds great or at least harmless. In practice it's a disaster that enables a catalog of bad practices, ranging from people papering over internal divisions through ambushing sysadmins with landmines to drastically complicating attempts to do automated and semi-automated installations and package updates. By contrast RPMs have a very hard rule, enforced by the system, that a package cannot interact with the user at install time at all. Packagers are forced to make decisions and insure that the package setup is sane, while sysadmins can script and otherwise automate RPM installs without worry.

Theoretically you can turn off all install time questions on Debian. In practice this can't be trusted because it's not the default state of affairs. Oh sure, it's probably considered a bug if a Debian package screws things up on a no-questions install, but do you want to be the first person to find that bug? If you are a sane sysadmin, the answer is 'no' (not unless you have no choice). In practice what is fully supported is what is routine and in the Debian world that's asking you questions during package installs.

(It's possible that graphical package management is changing this, but I'm not holding my breath.)

RpmFormatOverDebs written at 23:57:34; Add Comment

2013-09-02

The current weak areas of ZFS on Linux

I've been looking into ZFS on Linux for a while as a potential alternative to Illumos for our next generation of fileservers (FreeBSD is unfortunately disqualified). As part of that I have been working to understand ZoL's current weak areas so that I can better understand where it might cause us problems.

The following is the best current information I have; it comes from reading the ZoL mailing list (and at one point asking the ZoL mailing list this exact question).

The weak areas that I know about:

  • Using ZFS as your root filesystem requires wrestling with GRUB, Grub scripts, initramfs-building scripts, and support in installers (if you want to install the system as ZoL-root from the start). How well this works depends on your distribution; some have good support (eg Gentoo), others have third party repositories with prebuilt packages, and still others leave you on your own.

  • There are periodic problem reports about getting ZFS filesystems reliably mounted on boot.
  • In some environments ZoL can have problems reliably finding the disk devices for your pools on boot. This is especially likely if you use /dev/sd* device names but apparently sometimes happens to people who use more stable identifiers.

    (Apparently part of the likely solution is to hook ZoL into udev so that as disks are discovered ZoL checks to see if a pool now has a full set of devices and can be brought up.)

  • ZoL lacks a number of standard Linux filesystem features, including support for O_DIRECT, asynchronous IO, and POSIX ACLs. It also lacks support for issuing TRIM commands to drives (this is apparently only present in the FreeBSD version of ZFS so far).

  • There is no 'event daemon' to handle events like disks going away. The most significant result of this is that ZFS pool spares do not get activated on disk failure (making them basically pointless).

  • ZFS's use of kernel memory is not well integrated with the Linux kernel memory system, resulting in runaway memory usage in some situations. Apparently metadata intensive workloads (such as rsync runs) are especially prone to this.

The last issue deserves more discussion. All of this is what I've gathered from the mailing list and from looking at the ZFS on Linux source code.

To start with, ZFS on Linux is not really ZFS ported to Linux; instead it's mostly the Illumos ZFS code dropped on top of a layer of code to translate and emulate the Solaris kernel APIs that ZFS needs (the SPL, short for 'Solaris Porting Layer'). This includes a great deal of kernel memory handling. The unfortunate result of this is a series of mismatches between what ZFS thinks is going on with kernel memory and what is actually going on, due to the translation and emulation that is required. Through fragmentation that's invisible to ZFS and other issues, ZFS can wind up using a lot more memory for things like the ARC than it is supposed to (because ZFS thinks it's using a lot less memory than it actually is).

(I suspect that ZFS itself still has some degree of the ZFS level fragmentation problems we've seen but that's much less dangerous because it just leaves the ARC smaller than it should be. The ZoL problem is that the ARC and related things can eat all of your RAM and make your kernel explode.)

Whether this happens to you (and how much it affects you) is unpredictable because it depends very much on the details of how your system uses memory. As mentioned, people seem to have problems with metadata heavy workloads but not everyone reporting problems on the ZoL mailing lists is in this situation.

PS: if you are coming here from Internet searches, please pay attention to the date of this entry. I certainly hope that all of these issues will get dealt with over time.

ZFSonLinuxWeakAreas written at 22:13:52; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.