General ZFS pool shrinking will likely be coming to Illumos

January 15, 2015

Here is some great news. It started with this tweet from Alex Reece (which I saw via @bdha):

Finally got around to posting the device removal writeup for my first open source talk on #openzfs device removal! <link>

'Device removal' sounded vaguely interesting but I wasn't entirely sure why it called for a talk, since ZFS can already remove devices. Still, I'll read ZFS related things when I see them go by on Twitter, so I did. And my eyes popped right open.

This is really about being able to remove vdevs from a pool. In its current state I think the code requires all vdevs to be bare disks, which is not too useful for real configurations, but now that the big initial work has been done I suspect that there will be a big rush of people to improve it to cover more cases once it goes upstream to mainline Illumos (or before). Even being able to remove bare disks from pools with mirrored vdevs would be a big help for the 'I accidentally added a disk as a new vdev instead of as a mirror' situation that comes up periodically.

(This mistake is the difference between 'zpool add POOL DEV1 DEV2' and 'zpool add POOL mirror DEV1 DEV2'. You spotted the one word added to the second command, right?)

While this is not quite the same thing as an in-place reshape of your pool, a fully general version of this would let you move a pool from, say, mirroring to raidz provided that you had enough scratch disks for the transition (either because you are the kind of place that has them around or because you're moving to new disks anyways and you're just arranging them differently).

(While you can do this kind of 'reshaping' today by making a completely new pool and using zfs send and zfs receive, there are some advantages to being able to do it transparently and without interruptions while people are actively using the pool).

This feature has been a wishlist item for ZFS for so long that I'd long since given up on ever seeing it. To have even a preliminary version of it materialize out of the blue like this is simply amazing (and I'm a little bit surprised that this is the first I heard of it; I would have expected an explosion of excitement as the news started going around).

(Note that there may be an important fundamental limitation about this that I'm missing in my initial enthusiasm and reading. But still, it's the best news about this I've heard for, well, years.)


Comments on this page:

By Anton Eliasson at 2015-01-15 02:18:37:

This is exciting news indeed. Perhaps this feature could also be used for migrating ashift=9 vdevs to ashift=12 vdevs in a pool. Then you could move from 512b to 4k sector disks without having to destroy and recreate pools.

By Paul Tötterman at 2015-01-16 00:01:11:

So first steps toward Block Pointer Rewrite support?

By cks at 2015-01-16 23:11:42:

Based on reading the article, it seems that they're taking a completely different approach from rewriting block pointers; if I'm reading it correctly they've instead added a redirection layer, although the 'future work' has some discussion of (partial) block pointer rewriting. Again if I'm reading it right, the redirection layer is currently more or less permanent and will probably have some performance impacts. This isn't as ideal as full block pointer rewrites, but it's a start and I can think of vaguely plausible approaches to fix things up afterwards.

Written on 15 January 2015.
« What /etc/shells is and isn't
Link: Against DNSSEC by Thomas Ptacek »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jan 15 00:25:11 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.