Wandering Thoughts archives


Some notes on OpenZFS's new 'draid' vdev redundancy type

One piece of recent ZFS news is that OpenZFS 2.1.0 contains a new type of vdev redundancy called 'dRAID', which is short for 'distributed RAID'. OpenZFS has a dRAID HOWTO that starts with this summary:

dRAID is a variant of raidz that provides integrated distributed hot spares which allows for faster resilvering while retaining the benefits of raidz. A dRAID vdev is constructed from multiple internal raidz groups, each with D data devices and P parity devices. These groups are distributed over all of the children in order to fully utilize the available disk performance. This is known as parity declustering and it has been an active area of research. [...]

However, there are some cautions about draid, starting with this:

Another way dRAID differs from raidz is that it uses a fixed stripe width (padding as necessary with zeros). This allows a dRAID vdev to be sequentially resilvered, however the fixed stripe width significantly effects both usable capacity and IOPS. For example, with the default D=8 and 4k disk sectors the minimum allocation size is 32k. If using compression, this relatively large allocation size can reduce the effective compression ratio. [...]

Needless to say, this also means that the minimum size of files (and symlinks, and directories) is 32 Kb, unless they're so small that they can perhaps be squeezed into bonus space in ZFS dnodes..

Another caution is that you apparently can't get draid's fast rebuild speed without having configured spare space in your draid setup. This is sort of implicitly present in the description of draid, when read to say that the integrated distributed hot spare space is what allows for faster resilvering. Since I believe that you can't reshape a draid vdev after creation, you had better include the spare space from the start; otherwise, you have something that's inferior to raidz with the same parity.

According to the Ars Technica article on draid, draid has been heavily tested (and hopefully heavily used in production) in "several major OpenZFS development shops". The Ars Technica article also has its own set of diagrams, and also additional numbers and information; it's well worth reading if you're potentially interested in draid, including for additional cautions about draid's survivability in the face of multi-device failures.

I don't think we're interested in draid any more than we're interested in raidz. Resilvering time is not our major concern with raidz, and draid keeps the other issues from raidz, like full stripe reads. In fact, I'm not sure very many people will be interested in draid. The Ars Technica article starts its conclusion with:

Distributed RAID vdevs are mostly intended for large storage servers—OpenZFS draid design and testing revolved largely around 90-disk systems. At smaller scale, traditional vdevs and spares remain as useful as they ever were.

dRAID is intellectually cool and I'm okay that OpenZFS has it, but I'm not sure it will ever be common, and as SATA/SAS SSDs and NVMe drives become more prevalent in storage servers, its advantages over raidz may increasingly go away except for high-capacity archival servers that still have to use HDs.

As an additional note, the actual draid data layout on disk is quite complicated; Ars Technica points to the detailed comments in the code. Given that ZFS stores locations on disk in the form of ZFS DVAs, which specify the vdev and the "byte offset" into the vdev, you might wonder how DVA offsets work on draid vdevs. Unfortunately I don't know because the answer appears to be rather complicated based on vdev_draid_xlate(), which isn't surprising given a complicated on disk layout. I suspect that however draid maps DVA offsets has the same implications for growing draid vdevs as it does for growing raidz ones (the coming raidz expansion is carefully set up to cope with this).

solaris/ZFSDRaidNotes written at 23:20:12; Add Comment

In Go, pointers (mostly) don't go with slices in practice

When I wrote about why it matters that map values are unaddressable in Go, there were a set of Twitter replies from Sean Barrett:

Knowing none of the details & not being a go programmer, I would have guessed that map values aren't addressable because they're in a dynamically-sized hash table so they need to get relocated behind the user's back; getting the address of a value slot would break that.

But I'd also have assumed Go has dynamically-extensible arrays, and the same argument would apply in that case, so maybe not?

This sparked an article about how Go maps store their values and keys, so today I'm writing about the second part of Barrett's reply, about "dynamically-extensible arrays", because the situation here in Go is peculiar (especially from the perspective of a C or C++ programmer trying to extend their intuitions to Go). Put simply, Go has pointers and it has something like dynamically extensible arrays, but in practice you can't use pointers to slices or slice elements. Trying to combine the two is a recipe for pain, confusion, and weird problems.

On the surface, things look straightforward. The Go version of dynamically extensible arrays are slices. Slices and elements of slices are among Go's addressable values, so both of the following pointer creations are legal:

var s []int
s = append(s, 10, 20, 30)
// pointer to a slice element
//  and the slice
pe := &s[0]
ps := &s

At this moment you can dereference pe and ps and get the results you expect, including if you modify s[0] with eg 's[0] = 100'. Where things go off the rails is if you do anything else with the slice s, such as:

s = append(s, 50)
// return the slice from a function
return s

There are two problems. The first problem, possibly exposed by the append(), is that slice elements actually live in an anonymous backing array. Modifying the size of a slice (such as by appending another element to it) may create a new version of this anonymous backing array, and when the array is reallocated, any pointers to the old one aren't updated to point to the new one and so won't see any changes to it. So if you have the following code:

pe = &s[0]
s = append(s, 50)
s[0] = 100

The value of '*pe' may or may not now be 100, depending on whether the append() created a new version of the backing array.

The second problem is that slices themselves are passed, returned, and copied by value, which doesn't quite do what you might think because slices are lightweight things. A slice is a length, a reference to the anonymous backing array, and a capacity. Copying the slice copies these three, but doesn't copy the anonymous backing array itself, which means that many slices can refer to the same anonymous backing array (and yes this can get confusing and create fun problems).

When you take a pointer to a slice, you get a pointer to the current version of this tuple of information for the slice. This pointer may or may not refer to a slice that anyone else is using; for instance:

ps := &s
s = append(s, 50)

At this point, '*ps' may or may not be the same thing as 's', and so it might or might not have the new '50' element at the end. The more time that passes between taking a pointer to a slice and the slice being further manipulated, the less likely it is that 'ps' points to anything useful. If the slice 's' is returned from a function the return is copy by value, and so ps definitely no longer points to the live slice that the caller is using, although ps might have the same length and refer to the same anonymous backing array.

Update: It's been pointed out that this isn't true in the limited example here. In Go, variables like s are storage locations, so although the append() may return a different slice value, this different value will overwrite the old one in s and the ps pointer will still point to the current version of the slice. However, this isn't the case if the append() happens in a different function (after you either return s or pass it to a function as an argument).

This leads to the situation mentioned on Twitter by Tom Cheng:

> regular GC keeps pointers to the old version alive if necessary.

Wait... what? so if i get a pointer into an array, then resize the array, then get a pointer to the same index, i'll have 2 valid pointers to 2 completely different objects??

(For 'array', read 'dynamically extensible array', so a slice. The answer is yes.)

It's possible to use pointers to slices or to slice elements in limited situations, if you're very careful with what you're doing with them (or know exactly what you're doing and why). But in general, pointers to slices and slice elements don't do what you want.

Honestly, this is a strange and peculiar situation, although Go programmers have acclimatized to it. To programmers from other languages, such as C or C++, the concept of pointers to dynamically extensible arrays seems like a perfectly decent idea that surely should exist and work in Go. Well, it exists, and it "works" in the sense that it yields results and doesn't crash your program, but it doesn't "work" in the sense of doing what you'd actually want.

programming/GoSlicesVsPointers written at 00:42:58; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.