Reconsidering a ZFS root filesystem

March 26, 2013

A Twitter conversation from today:

@thatcks: Let's see if fsck can heal this Solaris machine or if I get to reinstall it from scratch. (Thanks to ILOMs I can do this from my desk.)

@bdha: fsck? Solaris? Sadface.

@thatcks: I have horror stories of corrupted zpool.cache files too. I don't know if you can boot a ZFS-root machine in that situation.

@bdha: I've been there. zpool.cache backups saved my ass.

Right now all of our Solaris fileservers have (mirrored) UFS root filesystems instead of ZFS root filesystems and in the past I've expressed some desire to see that continue in any future ZFS fileservers we built. I've written about why before; the short version is that I've seen situations where /etc/zfs/zpool.cache had to be deleted and recreated, and I'm not sure this is even possible if your root filesystem is a ZFS filesystem. Using UFS for root filesystems avoids this chicken and egg problem.

(Of course the whole situation around zpool.cache and ZFS pool activation is a little bit mysterious, at least in Solaris.)

Well, actually, that's not the only reason. The other reason is that I still think of ZFS as fragile, as something that will go from 'fine' to 'panics your system' under remarkably little provocation. UFS is much more old-fashioned and will soldier on even under relatively extreme circumstances (whether that's a wise idea is another question). Under most circumstances I would rather have our fileservers limping along than dead (even if the entire root filesystem becomes inaccessible, as an extreme example).

But all of this is basically supposition (and thus superstition). UFS certainly has its own problems (one of which I ran into today on our test server) and I've never actually tried out a modern Illumos-based system with ZFS root, both in normal operations and if I deliberately start breaking stuff (and I certainly hope that some of the problems I heard about years ago have been dealt with). It may well turn out that ZFS root based systems are easier to deal with and recover than I expect. They certainly have their own benefits (periodic scrubs are reassuring, for example).

(And to be honest, I think it's quite possible that Illumos will only really well support a ZFS root by the time we get to it. It's clear that ZFS root is where all of the enthusiasm is and where most people think we should be going.)

PS: root filesystem snapshot and snapshot rollback are not particularly an advantage in our particular environment, since we basically don't patch our fileservers. Of course periodic snapshots might save us in the face of a corrupt zpool.cache in the live filesystem.

Written on 26 March 2013.
« Rethinking avoiding Apache
What checksums in your filesystem are usually actually doing »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Mar 26 22:57:18 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.