2018-03-17
Much better ZFS pool recovery is coming (in open source ZFS)
One of the long standing issues in ZFS has been that while it's usually very resilient, it can also be very fragile if the wrong things get damaged. Classically, ZFS has had two modes of operation; either it would repair any damage or it would completely explode. There was no middle ground of error recovery, and this isn't a great experience; as I wrote once, panicing the system is not an error recovery strategy. In early versions of ZFS there was no recovery at all (you restored from backups); in later versions, ZFS added a feature where you could attempt to recover from damaged metadata by rewinding time, which was better than nothing but not a complete fix.
The good news is that that's going to change, and probably not too long from now. What you want to read about this is Turbocharging ZFS Data Recovery, by Pavel Zakharov of Delphix, which covers a bunch of work that he's done to make ZFS more resilient and more capable of importing various sorts of damaged pools. Of particular interest is the ability to likely recover at least some data from a pool that's lost an entire vdev. You can't get everything back, obviously, but ZFS metadata is usually replicated on multiple vdevs so losing a single vdev will hopefully leave you with enough left to at least get the rest of the data out of the pool.
(I saw this article via, itself via a retweet by @richardelling.)
All of this is really great news. ZFS has long needed better options for recovery from various pool problems, as well as better diagnostics for failed pool imports, and I'm quite happy that the situation is finally going to be improving.
The article is also interesting for its discussion of the current low level issues involved in pool importing. For example, until I read it I had no idea about how potentially dangerous a ZFS pool vdev change was due to how pool configurations are handled during the import process. I'd love to read more details on how pool importing really works and what the issues are (it's a long standing interest of mine), but sadly I suspect that no one with that depth of ZFS expertise has the kind of time it would take to write such an article.
As far as the timing of these features being available in your ZFS-using OS of choice goes, his article says this:
As of March 2018, it has landed on OpenZFS and Illumos but not yet on FreeBSD and Linux, where I’d expect it to be upstreamed in the next few months. The first OS that will get this feature will probably be OmniOS Community Edition, although I do not have an exact timeline.
If you have a sufficiently important damaged pool, under some circumstances it may be good enough if there is some OS, any OS, that can bring up the pool to recover the data in it. For all that I've had my issues with OmniOS's hardware support, OmniOS CE does have fairly decent hardware support and you can probably get it going on most modern hardware in an emergency.
(And if OmniOS can't talk directly to your disk hardware, there's
always iSCSI, as we can testify. There's
probably also other options for remote disk access that OmniOS ZFS
and zdb
can deal with.)
PS: If you're considering doing this in the future and your normal OS is something other than Illumos, you might want to pay attention to the ZFS feature flags you allow to be set on your pool, since this won't necessarily work if your pool uses features that OmniOS CE doesn't (yet) support. This is probably not going to be an issue for FreeBSD but might be an issue for ZFS on Linux. You probably want to compare the ZoL manpage on ZFS pool features with the Illumos version or even the OmniOS CE version.
Sidebar: Current ZFS pool feature differences
The latest Illumos tree has three new ZFS pool features from Delphix: device_removal, obsolete_counts (which enhances device removal), and zpool_checkpoint. These are all fairly recent additions; they appear to have landed in the Illumos tree this January and just recently, although the commits that implement them are dated from 2016.
ZFS on Linux has four new pool features: large_dnode, project_quota, userobj_accounting, and encryption. Both large dnodes and encryption have to be turned on explicitly, and the other two are read-only compatible, so in theory OmniOS can bring a pool up read-only even with them enabled (and you're going to want to have the pool read-only anyway).