Wandering Thoughts archives

2016-06-19

A lesson to myself: know your emergency contact numbers

Let's start with my tweets:

@thatcks: There's nothing quite like getting a weekend alert that a machine room we have network gear in is at 30C and climbing. Probably AC failure.

@thatcks: @isomer There is approximately nothing I can do, too. I'm not even sure who to potentially call, partly because it's not our machine room.

(This is the same machine room that got flooded because of an AC failure, which certainly added a degree of discomfort to the whole situation.)

In some organizations the answer here is 'go to the office and see about doing something, anything'. That is not how we work, for various reasons. It might be different if it was one of our main machine rooms, but an out of hours AC failure in a machine room we only have switches in is not a crisis sufficiently big to drag people to the office.

But, of course, there is a failure and a learning experience here, which is that I don't have any information written down about who to call to get the AC situation looked at by the university's Facilities and Services people. I've been through past machine room AC failures, and at the time I either read the signs we have on machine room doors or worked out (or heard) who to call to get it attended to, but I didn't write it down. Probably I thought that it was either obvious or surely I wouldn't forget it for next time around. Today I found out how well that went.

So, my lessons learned from this incident is that I should fix my ignorance problem once and for all. I should make a file with both in-hours and out-of-hours 'who to contact and/or notify' information for all of the machine rooms we're involved in. Probably we call the same people for a power failure as for an AC failure or another incident, but I should find out for sure and note this down too. Then I should replicate the file to at least my home machine, and probably keep a printout in the office (in case there's a failure in our main machine room, which would take our entire environment down).

(It would be sensible to also have contact information for, say, a failure in our campus backbone connection. I think I know who to try to call there, but I'm not sure and if it fails I won't exactly be able to look things up in the campus directory.)

sysadmin/KnowYourEmergencyNumbers written at 22:54:12; Add Comment

Why ZFS can't really allow you to add disks to raidz vdevs

Today, the only change ZFS lets you make to a raidz vdev once you've created it is to replace a disk with another one. You can't do things like, oh, adding another disk to expand the vdev, which people wish for every so often. On the surface, this is an artificial limitation that could be bypassed if ZFS wanted to, although it wouldn't really do what you want. Underneath the surface, there is an important ZFS invariant that makes it impossible.

What makes this nominally easy in theory is that ZFS raidz vdevs already use variable width stripes. A conventional RAID system uses full width stripes, where all stripes span all disks. When you add another disk, the RAID system has to change how all of the existing data is laid out to preserve this full width; you goes from having the data and parity striped across N disks to having it striped across N+1 disks. But with variable width stripes, ZFS doesn't have this problem; adding an existing disk doesn't require touching any of the existing stripes, even what were full width stripes. All that happens is they go from being full width stripes to being partial width stripes.

However, this is probably not really what you wanted because it doesn't get you as much new space as adding a disk does in a conventional RAID system. In a conventional RAID system, the reshaping involved both minimizes the RAID overhead and gives you a large contiguous chunk of free space at the end of the RAID array. In ZFS, simply adding a disk this way would obviously not do that; all of your old 'full width' stripes are now somewhat inefficient partial width stripes, and much of the free space is going to be scattered about in little bits at the end of those partial width stripes.

In fact, the free space issue is the fatal flaw here. ZFS raidz imposes a minimum size on chunks of free space; they must be large enough that it can write one data block plus its parity blocks (ie N+1, where N is the raidz level). Were we to just add another disk along side existing disks, much of the free space on it could in fact violate this invariant. For example, if the vdev previously had two consecutive full width stripes next to each other, adding a new disk will create a single-block chunk of free space in between them.

You might be able to get around this by immediately marking such space on the new disk as allocated instead of free, but if so you could find that you got almost no extra space from adding the disk. This is probably especially likely on a relatively full pool, which is exactly the situation where you'd like to get space quickly by adding another disk to your existing raidz vdev.

Realistically, adding a disk to a ZFS raidz vdev requires the same sort of reshaping as adding a disk to a normal RAID-5+ system; you really want to rewrite stripes so that they span across all disks as much as possible. As a result, I think we're unlikely to ever see it in ZFS.

solaris/ZFSRaidzDiskAddition written at 02:03:01; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.