Wandering Thoughts archives


An important little detail of our ZFS spares setup

I've written before about our ZFS spares handling system (2, 3) that we use for our fileservers. In all of that time, I've casually hand-waved a bit of terminology by calling our spares 'disks'. While they are disks from the perspective of the fileservers, our spares are not separate physical disks on the iSCSI backends (well, not usually, and I'll get to that).

We partition the 2TB physical HDs on the iSCSI backends into a number of standard sized chunks (four, in our case). It is these chunks that are exported to the fileservers, the fileservers see as disks, and thus that form the pool of unused 'disks' that become potential spares. Our spares system knows about the mapping to physical disks and thus normally avoids things like using a spare 'disk' (we call them chunks) that comes from the same HD as a pool is already using.

Where this matters is when we come around to the issue of testing your spare disks. When we started allocating chunks to pools on our new fileservers, we made a deliberate decision not to reserve one or more physical disks purely for spare chunks. Instead we smeared the collection of pools across all of the physical disks, which meant that once a fileserver had at least 14 chunks allocated to pools, all physical disks in the fileserver's backends were receiving IO. We had and have spare chunks, but we don't have any spare physical disks; all disks on all backends are actually active, making the issue of testing them relatively moot.

(Our smallest fileserver has 16 chunks allocated at the moment, which is just a bit over the 14 chunk threshold to get all disks busy.)

Recently we decided that one fileserver had so much space allocated on it that it was running alarmingly low on spare chunks. To deal with this, we added a fifteenth disk to each of its backends and this time, specifically reserved the chunks from these disks as spares. We'll never grow pools onto these disks; they now actually are spare disks, not just spare chunks on active disks. Which means that now we get to think about testing them (as I alluded to in this entry).

(Smearing our pools across all available physical disks and not reserving disks as pure spares is a policy choice, not a technical requirement. By now I can't remember exactly why we decided to do it this way; possibly we just thought it was easier and we might as well. Since it's possible to shuffle around the chunks that a pool uses, we can always change our minds on this later.)

Sidebar: the exception to this picture is our all-SSD pools

We have one fileserver and pair of backends that only has SSDs. The SSDs that we buy aren't big enough (today) to slice up into chunks, so each SSD is only one 'disk' on the fileserver. This means that the spare SSDs in the backends are genuinely unused. We haven't been worrying about this so far, but probably we should.

solaris/OurSparesSystemIV written at 01:47:43; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.