Why ZFS is not good at growing and reshaping pools (or shrinking them)

January 29, 2020

I recently read Mark McBride's Five Years of Btrfs (via), which has a significant discussion of why McBride chose Btrfs over ZFS that boils down to ZFS not being very good at evolving your pool structure. You might doubt this judgment from a Btrfs user, so let me say as both a fan of ZFS and a long term user of it that this is unfortunately quite true; ZFS is not a good choice if you want to modify your pool disk layout significantly over time. ZFS works best if the only change in your pools that you do is replacing drives with bigger drives. In our ZFS environment we go to quite some lengths to be able to expand pools incrementally over time, and while this works it both leaves us with unbalanced pools and means that we're basically forced to use mirroring instead of RAIDZ.

(An unbalanced pool is one where some vdevs and disks have much more data than others. This is less of an issue for us now that we're using SSDs instead of HDs.)

You might sensibly ask why ZFS is not good at this, despite being many years old (and people having had this issue with ZFS for a long time). One fundamental reason is that ZFS is philosophically and practically opposed to rewriting existing data on disk; once written, it wants everything to be completely immutable (apart from copying it to replacement disks, and more or less). But any sort of restructuring or re-balancing of a pool of storage (whether ZFS or Btrfs or whatever) necessarily involves shifting data around; data that used to live on this disk must be rewritten so that it now lives on that disk (and all of this has to be kept track of, directly or indirectly). It's rather difficult to have immutable data but mutable storage layouts.

(In the grand tradition of computer science we can sort of solve this problem with a layer of indirection, where the top layer stays immutable but the bottom layer mutates. This is awkward and doesn't entirely satisfy either side, and is in fact how ZFS's relatively new pool shrinking works.)

This is also the simpler approach for ZFS to take. Not having to support reshaping your storage requires less code and less design (for instance, you don't have to figure out how to reliably keep track of how far along a reshaping operation is). Less code also means less bugs, and bugs in reshaping operations can be catastrophic. Since ZFS was not designed to support any real sort of reshaping, adding it would be a lot of work (in both design and code) and raise a lot of questions, which is a good part of why no one has really tackled this for all of the years that ZFS has been around.

(The official party line of ZFS's design is more or less that you should get your storage right the first time around, or to put it another way, that ZFS was designed for locally attached storage where you start out with a fully configured system rather than incrementally expanding to full capacity over time.)

(This is an aspect of how ZFS is not a universal filesystem. Just as ZFS is not good for all workloads, it's not good for all patterns of growth and system evolution.)

Written on 29 January 2020.
« More badly encoded MIME Content-Disposition headers
Some effects of the ZFS DVA format on data layout and growing ZFS pools »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jan 29 00:20:22 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.