Wandering Thoughts archives


How much space ZFS reserves in your pools varies across versions

Back in my entry on the difference in available pool space between zfs list and zpool list, I noted that one of the reasons the two differ is that ZFS reserves some amount of space internally. At the time I wrote that the code said it should be reserving 1/32nd of the pool size (and still allow some things down to 1/64th of the pool, like ZFS property changes) but our OmniOS fileservers seemed to be only reserving 1/64th of the space (and imposing a hard limit at that point). It turns out that this discrepancy has a simple explanation: ZFS has changed its behavior over time.

This change is Illumos issue 4951, 'ZFS administrative commands should use reserved space, not fail with ENOSPC', which landed in roughly July of 2014. When I wrote my original entry in late 2014 I looked at the latest Illumos source code at the time and so saw this change, but of course our ZFS fileservers were using a version of OmniOS that predated the change and so were using the old 1/64th of the pool hard limit.

The change has propagated into various Illumos distributions and other ZFS implementations at different points. In OmniOS it's in up to date versions of the r151012 and r151014 releases, but not in r151010 and earlier. In ZFS on Linux, it landed in the 0.6.5 release and was not in 0.6.4. In FreeBSD, this change is definitely in -current (and appears to have arrived very close to when it did in Illumos), but it postdates 10.0's release and I think arrived in 10.1.0.

This change has an important consequence: when you update across this change, your pools will effectively shrink, because you'll go from ZFS reserving 1/64th of their space to reserving 1/32nd of their space. If your pools have lots of space, well, this isn't a problem. If your pools have only some space, your users may notice it suddenly shrinking a certain amount (some of our pools will lose half their free space if we don't expand them). And if your pools are sufficiently close to full, they will instantly become over-full and you'll have to delete things to free up space (or expand the pool on the spot).

I believe that you can revert to the old 1/64th limit if you really want to, but unfortunately it's a global setting so you can't do it selectively for some pools while leaving others at the default 1/32nd limit. Thus, if you have to do this you might want to do so only temporarily in order to buy time while you clean up or expand pools.

(Of course, by now most people may have already dealt with this. We're a bit behind the times in terms of what OmniOS version we're using.)

Sidebar: My lesson learned here

The lesson I've learned from this is that I should probably stop reflexively reading code from the Illumos master repo and instead read the OmniOS code for the branch we're using. Going straight to the current 'master' version is a habit I got into in the OpenSolaris days, when there simply was no source tree that corresponded to the Solaris 10 update whatever that we were running. But these days that's no longer the case and I can read pretty much the real source code for what's running on our fileservers. And I should, just to avoid this sort of confusion.

(Perhaps going to the master source and then getting confused was a good thing in this case, since it's made me familiar with the new state of affairs too. But it won't always go so nicely.)

solaris/ZFSReservedSpaceVaries written at 22:23:55; Add Comment

Our low-rent approach to verifying that NFS mounts are there

Our mail system has everyone's inboxes in an old-fashioned /var/mail style single directory; in fact it literally is /var/mail. This directory is NFS mounted from one of our fileservers, which raises a little question: how can we be sure that it's actually there? Well, there's always going to be a /var/mail directory. But what we care about is that this directory is the actual NFS mounted filesystem instead of the directory on the local root filesystem that is the mount point, because we very much do not want to ever deliver email to the latter.

(Some people may say that limited directory permissions on the mount point should make delivery attempts fail. 'Should' is not a word that I like in this situation, either in 'should fail' or 'that failure should be retried'.)

There are probably lots of clever solutions to this problem involving advanced tricks like embedded Perl bits in the mailer that look at NFS mount state and so on. We opted for a simple and low tech approach: we have a magic flag file in the NFS version of /var/mail, imaginatively called .NFS-MOUNTED. If the flag file is not present, we assume that the filesystem is not mounted and stall all email delivery to /var/mail.

This scheme is subject to various potential issues (like accidentally deleting .NFS-MOUNTED some day), but it has the great virtue that it is simple and relatively bulletproof. It helps that Exim has robust support for checking whether or not a file exists (although we use a hack for various reasons). The whole thing has worked well and basically transparently, and we haven't removed one those .NFS-MOUNTED files by accident yet.

(We actually use this trick for several NFS-mounted mail related directories that we need to verify are present before we start trying to do things involving them, not just /var/mail.)

(I mentioned this trick in passing here, but today I feel like writing it up explicitly.)

Sidebar: our alternate approach with user home directories

Since user home directories are NFS mounted, you might be wondering if we also use flag files there to verify that the NFS mounts are present before checking things like .forward files. Because of how our NFS mounts are organized, we use an alternate approach instead. In short, our NFS mounts aren't directly for user home directories; instead they're for filesystems with user home directories in them.

(A user has a home directory like /h/281/cks, where /h/281 is the actual NFS mounted filesystem.)

In this situation it suffices to just check that the user's home directory exists. If it does, the NFS filesystem it is in must be mounted (well, unless someone has done something very perverse). As a useful side bonus, this guards against various other errors (eg, 'user home directory was listed wrong in /etc/passwd').

sysadmin/VerifyingNFSMounts written at 01:35:05; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.