Why I am not enthused about ZFS for my root filesystem

February 25, 2011

One of the changes in Oracle's recent Solaris 11 preview (and in OpenSolaris releases before it) is that your root filesystem must be a ZFS filesystem; it can no longer be a UFS filesystem. While I understand why Oracle did this, it is not a change that leaves me feeling very enthused.

The short version of why I do not like this is that previously, the entire ZFS subsystem could go belly-up and your system could still boot. You might think that ZFS going belly-up entirely should not happen, but the problem there is /etc/zfs/zpool.cache, the system ZFS cachefile. This is a binary file that can only be maintained by ZFS tools, and it at least used to be possible for it to become corrupt. When it became corrupt, your method of fixing it was, uh, to remove it and start again by re-importing your pools.

This is generally possible if your actual Solaris system filesystem is on UFS (both the 'removing' bit and the 'starting again' bit). My strong impression is that this is much harder if your root filesystem is ZFS, because you have a little chicken and egg problem.

(In fact at one point the official solution for the devices involved in the root pool changing names was 'boot from a rescue environment'. Yes, really. In an enterprise operating system, with self-identifying filesystems and storage pools. I hope that this has changed since then.)

Possibly the rescue environment has a well-honed solution to this problem (one that gets your root pool back and the system booting to single-user mode so that you can fix everything else), or perhaps this doesn't happen any more. But frankly, Solaris 10 has not impressed me with its resilience in the face of various events, so I am not inclined to trust it here; I would much prefer the simpler, far better tested approach of a UFS root filesystem.

Comments on this page:

From at 2011-02-25 09:56:43:

A cautionary tale
Charles Polisher

By cks at 2011-02-25 10:40:25:

Oh yes indeed. I expect to lose a pool to recoverable corruption at some point over the lifetime of ZFS here. I don't like it, but I'm resigned to it in exchange for the benefits that ZFS gives us.

By trs80 at 2011-02-25 11:21:50:

If you (can) boot from a previous snapshot or boot environment you should be able to get access to an old version of /etc/zfs/zpool.cache, correct? If nothing else they provide a convenient rescue environment to munge the current boot environment's cachefile.

From at 2011-02-25 18:00:33:

The newer versions of zpool that ship with solaris 10u9 and Solaris 11 Express can import a pool with corrupted metadata by going back to a older version of the uberblock. Some transactions will be lost, but the data will be consistent with the previous point in time.

From at 2011-02-27 02:52:25:

Having recovered a pool with a busted slog last year, import -m and -F work pretty well. Keeping backups of zpool.cache so you can reference them during recovery would definitely be a good idea, though (it's in /etc, so presumably your normal system backups will have it already).

I've been running zfsroot in production since 12/2007, and have yet to loose a pool irrevocably, or even have anything I would term a moderate problem. Since the new metaslab allocator went in, I haven't had near-full performance issues either.

The scare late last year was due to my own stupidity: http://mirrorshades.net/post/1485951163

Shit happens, sure, but so far the shit has been rare and the benefits major.

The real problem with ZFS from a systems perspective has always been Live Upgrade and upgrading zones. But of course, that particular problem has been shot in the face.

Written on 25 February 2011.
« Handling variations of a page (or, the revenge of REST)
A belated realization about web spiders and your page cache »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Feb 25 00:45:45 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.