== Reconsidering a ZFS root filesystem A Twitter conversation from today: > [[@thatcks > https://twitter.com/thatcks/status/316646480452648961]]: Let's see if > fsck can heal this Solaris machine or if I get to reinstall it from > scratch. (Thanks to ILOMs I can do this from my desk.) > > [[@bdha https://twitter.com/bdha/status/316646966517964802]]: > fsck? Solaris? Sadface. > > [[@thatcks > https://twitter.com/thatcks/status/316647348413550592]]: I have horror > stories of corrupted zpool.cache files too. I don't know if you can > boot a ZFS-root machine in that situation. > > [[@bdha > https://twitter.com/bdha/status/316675429824069632]]: I've been there. > zpool.cache backups saved my ass. Right now all of [[our Solaris fileservers ZFSFileserverSetup]] have (mirrored) UFS root filesystems instead of ZFS root filesystems and in the past I've expressed some desire to see that continue in any future ZFS fileservers we built. [[I've written about why before ZFSForRootGrump]]; the short version is that I've seen situations where _/etc/zfs/zpool.cache_ had to be deleted and recreated, and I'm not sure this is even possible if your root filesystem is a ZFS filesystem. Using UFS for root filesystems avoids this chicken and egg problem. (Of course the whole situation around _zpool.cache_ and ZFS pool activation is [[a little bit mysterious ZFSPoolActivationI]], at least in Solaris.) Well, actually, that's not the only reason. The other reason is that [[I still think of ZFS as fragile ZFSShatteringProblem]], as something that will go from 'fine' to 'panics your system' under remarkably little provocation. UFS is much more old-fashioned and will soldier on even under relatively extreme circumstances (whether that's a wise idea is another question). Under most circumstances I would rather have our fileservers limping along than dead (even if the entire root filesystem becomes inaccessible, as an extreme example). But all of this is basically supposition (and thus superstition). UFS certainly has its own problems (one of which I ran into today on our test server) and I've never actually tried out a modern Illumos-based system with ZFS root, both in normal operations and if I deliberately start breaking stuff (and I certainly hope that some of the problems I heard about years ago have been dealt with). It may well turn out that ZFS root based systems are easier to deal with and recover than I expect. They certainly have their own benefits (periodic scrubs are reassuring, for example). (And to be honest, I think it's quite possible that Illumos will only really well support a ZFS root by the time we get to it. It's clear that ZFS root is where all of the enthusiasm is and where most people think we should be going.) PS: root filesystem snapshot and snapshot rollback are not particularly an advantage in our particular environment, since we basically don't patch our fileservers. Of course periodic snapshots might save us in the face of a corrupt _zpool.cache_ in the live filesystem.