Some notes on OpenZFS's new 'draid' vdev redundancy type

August 29, 2021

One piece of recent ZFS news is that OpenZFS 2.1.0 contains a new type of vdev redundancy called 'dRAID', which is short for 'distributed RAID'. OpenZFS has a dRAID HOWTO that starts with this summary:

dRAID is a variant of raidz that provides integrated distributed hot spares which allows for faster resilvering while retaining the benefits of raidz. A dRAID vdev is constructed from multiple internal raidz groups, each with D data devices and P parity devices. These groups are distributed over all of the children in order to fully utilize the available disk performance. This is known as parity declustering and it has been an active area of research. [...]

However, there are some cautions about draid, starting with this:

Another way dRAID differs from raidz is that it uses a fixed stripe width (padding as necessary with zeros). This allows a dRAID vdev to be sequentially resilvered, however the fixed stripe width significantly effects both usable capacity and IOPS. For example, with the default D=8 and 4k disk sectors the minimum allocation size is 32k. If using compression, this relatively large allocation size can reduce the effective compression ratio. [...]

Needless to say, this also means that the minimum size of files (and symlinks, and directories) is 32 Kb, unless they're so small that they can perhaps be squeezed into bonus space in ZFS dnodes..

Another caution is that you apparently can't get draid's fast rebuild speed without having configured spare space in your draid setup. This is sort of implicitly present in the description of draid, when read to say that the integrated distributed hot spare space is what allows for faster resilvering. Since I believe that you can't reshape a draid vdev after creation, you had better include the spare space from the start; otherwise, you have something that's inferior to raidz with the same parity.

According to the Ars Technica article on draid, draid has been heavily tested (and hopefully heavily used in production) in "several major OpenZFS development shops". The Ars Technica article also has its own set of diagrams, and also additional numbers and information; it's well worth reading if you're potentially interested in draid, including for additional cautions about draid's survivability in the face of multi-device failures.

I don't think we're interested in draid any more than we're interested in raidz. Resilvering time is not our major concern with raidz, and draid keeps the other issues from raidz, like full stripe reads. In fact, I'm not sure very many people will be interested in draid. The Ars Technica article starts its conclusion with:

Distributed RAID vdevs are mostly intended for large storage servers—OpenZFS draid design and testing revolved largely around 90-disk systems. At smaller scale, traditional vdevs and spares remain as useful as they ever were.

dRAID is intellectually cool and I'm okay that OpenZFS has it, but I'm not sure it will ever be common, and as SATA/SAS SSDs and NVMe drives become more prevalent in storage servers, its advantages over raidz may increasingly go away except for high-capacity archival servers that still have to use HDs.

As an additional note, the actual draid data layout on disk is quite complicated; Ars Technica points to the detailed comments in the code. Given that ZFS stores locations on disk in the form of ZFS DVAs, which specify the vdev and the "byte offset" into the vdev, you might wonder how DVA offsets work on draid vdevs. Unfortunately I don't know because the answer appears to be rather complicated based on vdev_draid_xlate(), which isn't surprising given a complicated on disk layout. I suspect that however draid maps DVA offsets has the same implications for growing draid vdevs as it does for growing raidz ones (the coming raidz expansion is carefully set up to cope with this).

Written on 29 August 2021.
« In Go, pointers (mostly) don't go with slices in practice
How ZFS stores symbolic links on disk »

Page tools: View Source.
Search:
Login: Password:

Last modified: Sun Aug 29 23:20:12 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.