Reconsidering a ZFS root filesystem

March 26, 2013

A Twitter conversation from today:

@thatcks: Let's see if fsck can heal this Solaris machine or if I get to reinstall it from scratch. (Thanks to ILOMs I can do this from my desk.)

@bdha: fsck? Solaris? Sadface.

@thatcks: I have horror stories of corrupted zpool.cache files too. I don't know if you can boot a ZFS-root machine in that situation.

@bdha: I've been there. zpool.cache backups saved my ass.

Right now all of our Solaris fileservers have (mirrored) UFS root filesystems instead of ZFS root filesystems and in the past I've expressed some desire to see that continue in any future ZFS fileservers we built. I've written about why before; the short version is that I've seen situations where /etc/zfs/zpool.cache had to be deleted and recreated, and I'm not sure this is even possible if your root filesystem is a ZFS filesystem. Using UFS for root filesystems avoids this chicken and egg problem.

(Of course the whole situation around zpool.cache and ZFS pool activation is a little bit mysterious, at least in Solaris.)

Well, actually, that's not the only reason. The other reason is that I still think of ZFS as fragile, as something that will go from 'fine' to 'panics your system' under remarkably little provocation. UFS is much more old-fashioned and will soldier on even under relatively extreme circumstances (whether that's a wise idea is another question). Under most circumstances I would rather have our fileservers limping along than dead (even if the entire root filesystem becomes inaccessible, as an extreme example).

But all of this is basically supposition (and thus superstition). UFS certainly has its own problems (one of which I ran into today on our test server) and I've never actually tried out a modern Illumos-based system with ZFS root, both in normal operations and if I deliberately start breaking stuff (and I certainly hope that some of the problems I heard about years ago have been dealt with). It may well turn out that ZFS root based systems are easier to deal with and recover than I expect. They certainly have their own benefits (periodic scrubs are reassuring, for example).

(And to be honest, I think it's quite possible that Illumos will only really well support a ZFS root by the time we get to it. It's clear that ZFS root is where all of the enthusiasm is and where most people think we should be going.)

PS: root filesystem snapshot and snapshot rollback are not particularly an advantage in our particular environment, since we basically don't patch our fileservers. Of course periodic snapshots might save us in the face of a corrupt zpool.cache in the live filesystem.


Comments on this page:

From 130.63.95.67 at 2013-03-27 08:07:29:

You never patch your servers, is that all servers or just your file servers? What about your Linux servers? I would to hear more of your reasoning behind this.

By cks at 2013-03-27 11:29:17:

The long answer is in PatchingAppliancesSystems. The short answer is that we basically have two sorts of machines here, appliances and systems. Systems are generally used and exposed machine and get patched; appliances are more or less sealed black boxes and don't get patched unless we have to. The ZFS fileservers and the iSCSI backends are both appliance machines.

(The one thing that I didn't mention in that entry is that more or less by definition patches are either unimportant or require us to go through a painstaking full validation process. If they don't change something we're using they're unimportant; if they do change something we're using we absolutely have to make sure that everything still works right after the patch.)

From 108.60.100.203 at 2013-03-27 17:27:12:

FYI, recent FreeBSD changes made it possible to boot from ZFS without zpool.cache.

http://lists.freebsd.org/pipermail/freebsd-stable/2012-December/071345.html

Written on 26 March 2013.
« Rethinking avoiding Apache
What checksums in your filesystem are usually actually doing »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Mar 26 22:57:18 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.