Some notes on OpenZFS's new 'draid' vdev redundancy type
One piece of recent ZFS news is that OpenZFS 2.1.0 contains a new type of vdev redundancy called 'dRAID', which is short for 'distributed RAID'. OpenZFS has a dRAID HOWTO that starts with this summary:
dRAID is a variant of raidz that provides integrated distributed hot spares which allows for faster resilvering while retaining the benefits of raidz. A dRAID vdev is constructed from multiple internal raidz groups, each with D data devices and P parity devices. These groups are distributed over all of the children in order to fully utilize the available disk performance. This is known as parity declustering and it has been an active area of research. [...]
However, there are some cautions about draid, starting with this:
Another way dRAID differs from raidz is that it uses a fixed stripe width (padding as necessary with zeros). This allows a dRAID vdev to be sequentially resilvered, however the fixed stripe width significantly effects both usable capacity and IOPS. For example, with the default D=8 and 4k disk sectors the minimum allocation size is 32k. If using compression, this relatively large allocation size can reduce the effective compression ratio. [...]
Needless to say, this also means that the minimum size of files (and symlinks, and directories) is 32 Kb, unless they're so small that they can perhaps be squeezed into bonus space in ZFS dnodes..
Another caution is that you apparently can't get draid's fast rebuild speed without having configured spare space in your draid setup. This is sort of implicitly present in the description of draid, when read to say that the integrated distributed hot spare space is what allows for faster resilvering. Since I believe that you can't reshape a draid vdev after creation, you had better include the spare space from the start; otherwise, you have something that's inferior to raidz with the same parity.
According to the Ars Technica article on draid, draid has been heavily tested (and hopefully heavily used in production) in "several major OpenZFS development shops". The Ars Technica article also has its own set of diagrams, and also additional numbers and information; it's well worth reading if you're potentially interested in draid, including for additional cautions about draid's survivability in the face of multi-device failures.
I don't think we're interested in draid any more than we're interested in raidz. Resilvering time is not our major concern with raidz, and draid keeps the other issues from raidz, like full stripe reads. In fact, I'm not sure very many people will be interested in draid. The Ars Technica article starts its conclusion with:
Distributed RAID vdevs are mostly intended for large storage servers—OpenZFS draid design and testing revolved largely around 90-disk systems. At smaller scale, traditional vdevs and spares remain as useful as they ever were.
dRAID is intellectually cool and I'm okay that OpenZFS has it, but I'm not sure it will ever be common, and as SATA/SAS SSDs and NVMe drives become more prevalent in storage servers, its advantages over raidz may increasingly go away except for high-capacity archival servers that still have to use HDs.
As an additional note, the actual draid data layout on disk is quite
complicated; Ars Technica points to the detailed comments in the
Given that ZFS stores locations on disk in the form of ZFS DVAs, which specify the vdev and the "byte
offset" into the vdev, you might wonder how DVA offsets work on
draid vdevs. Unfortunately I don't know because the answer appears
to be rather complicated based on
which isn't surprising given a complicated on disk layout. I suspect
that however draid maps DVA offsets has the same implications for
growing draid vdevs as it does for growing raidz ones (the coming raidz expansion is carefully set
up to cope with this).
In Go, pointers (mostly) don't go with slices in practice
When I wrote about why it matters that map values are unaddressable in Go, there were a set of Twitter replies from Sean Barrett:
Knowing none of the details & not being a go programmer, I would have guessed that map values aren't addressable because they're in a dynamically-sized hash table so they need to get relocated behind the user's back; getting the address of a value slot would break that.
But I'd also have assumed Go has dynamically-extensible arrays, and the same argument would apply in that case, so maybe not?
This sparked an article about how Go maps store their values and keys, so today I'm writing about the second part of Barrett's reply, about "dynamically-extensible arrays", because the situation here in Go is peculiar (especially from the perspective of a C or C++ programmer trying to extend their intuitions to Go). Put simply, Go has pointers and it has something like dynamically extensible arrays, but in practice you can't use pointers to slices or slice elements. Trying to combine the two is a recipe for pain, confusion, and weird problems.
On the surface, things look straightforward. The Go version of dynamically extensible arrays are slices. Slices and elements of slices are among Go's addressable values, so both of the following pointer creations are legal:
var s int s = append(s, 10, 20, 30) // pointer to a slice element // and the slice pe := &s ps := &s
At this moment you can dereference
ps and get the results
you expect, including if you modify
s with eg '
s = 100'.
Where things go off the rails is if you do anything else with the
s, such as:
s = append(s, 50) // return the slice from a function return s
There are two problems. The first problem, possibly exposed by the
append(), is that slice elements actually live in an anonymous
backing array. Modifying the size of a slice (such as by appending
another element to it) may create a new version of this anonymous
backing array, and when the array is reallocated, any pointers to
the old one aren't updated to point to the new one and so won't see
any changes to it. So if you have the following code:
pe = &s s = append(s, 50) s = 100
The value of '
*pe' may or may not now be 100, depending on whether
append() created a new version of the backing array.
The second problem is that slices themselves are passed, returned, and copied by value, which doesn't quite do what you might think because slices are lightweight things. A slice is a length, a reference to the anonymous backing array, and a capacity. Copying the slice copies these three, but doesn't copy the anonymous backing array itself, which means that many slices can refer to the same anonymous backing array (and yes this can get confusing and create fun problems).
When you take a pointer to a slice, you get a pointer to the current version of this tuple of information for the slice. This pointer may or may not refer to a slice that anyone else is using; for instance:
ps := &s s = append(s, 50)
At this point, '
*ps' may or may not be the same thing as '
and so it might or might not have the new '50' element at the end.
The more time that passes between taking a pointer to a slice and
the slice being further manipulated, the less likely it is that
ps' points to anything useful. If the slice '
s' is returned
from a function the return is copy by value, and so
no longer points to the live slice that the caller is using, although
ps might have the same length and refer to the same anonymous
Update: It's been pointed out that this isn't true in the limited
example here. In Go, variables like
s are storage locations, so
append() may return a different slice value, this
different value will overwrite the old one in
s and the
pointer will still point to the current version of the slice.
However, this isn't the case if the
append() happens in a different
function (after you either return
s or pass it to a function as
This leads to the situation mentioned on Twitter by Tom Cheng:
> regular GC keeps pointers to the old version alive if necessary.
Wait... what? so if i get a pointer into an array, then resize the array, then get a pointer to the same index, i'll have 2 valid pointers to 2 completely different objects??
(For 'array', read 'dynamically extensible array', so a slice. The answer is yes.)
It's possible to use pointers to slices or to slice elements in limited situations, if you're very careful with what you're doing with them (or know exactly what you're doing and why). But in general, pointers to slices and slice elements don't do what you want.
Honestly, this is a strange and peculiar situation, although Go programmers have acclimatized to it. To programmers from other languages, such as C or C++, the concept of pointers to dynamically extensible arrays seems like a perfectly decent idea that surely should exist and work in Go. Well, it exists, and it "works" in the sense that it yields results and doesn't crash your program, but it doesn't "work" in the sense of doing what you'd actually want.