Why ZFS can't really allow you to add disks to raidz vdevs

June 19, 2016

Today, the only change ZFS lets you make to a raidz vdev once you've created it is to replace a disk with another one. You can't do things like, oh, adding another disk to expand the vdev, which people wish for every so often. On the surface, this is an artificial limitation that could be bypassed if ZFS wanted to, although it wouldn't really do what you want. Underneath the surface, there is an important ZFS invariant that makes it impossible.

What makes this nominally easy in theory is that ZFS raidz vdevs already use variable width stripes. A conventional RAID system uses full width stripes, where all stripes span all disks. When you add another disk, the RAID system has to change how all of the existing data is laid out to preserve this full width; you goes from having the data and parity striped across N disks to having it striped across N+1 disks. But with variable width stripes, ZFS doesn't have this problem; adding an existing disk doesn't require touching any of the existing stripes, even what were full width stripes. All that happens is they go from being full width stripes to being partial width stripes.

However, this is probably not really what you wanted because it doesn't get you as much new space as adding a disk does in a conventional RAID system. In a conventional RAID system, the reshaping involved both minimizes the RAID overhead and gives you a large contiguous chunk of free space at the end of the RAID array. In ZFS, simply adding a disk this way would obviously not do that; all of your old 'full width' stripes are now somewhat inefficient partial width stripes, and much of the free space is going to be scattered about in little bits at the end of those partial width stripes.

In fact, the free space issue is the fatal flaw here. ZFS raidz imposes a minimum size on chunks of free space; they must be large enough that it can write one data block plus its parity blocks (ie N+1, where N is the raidz level). Were we to just add another disk along side existing disks, much of the free space on it could in fact violate this invariant. For example, if the vdev previously had two consecutive full width stripes next to each other, adding a new disk will create a single-block chunk of free space in between them.

You might be able to get around this by immediately marking such space on the new disk as allocated instead of free, but if so you could find that you got almost no extra space from adding the disk. This is probably especially likely on a relatively full pool, which is exactly the situation where you'd like to get space quickly by adding another disk to your existing raidz vdev.

Realistically, adding a disk to a ZFS raidz vdev requires the same sort of reshaping as adding a disk to a normal RAID-5+ system; you really want to rewrite stripes so that they span across all disks as much as possible. As a result, I think we're unlikely to ever see it in ZFS.

Written on 19 June 2016.
« It's easier to shrink RAID disk volumes than to reshape them
A lesson to myself: know your emergency contact numbers »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jun 19 02:03:01 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.