Keeping around an index to the disk bays on our iSCSI backends

July 6, 2016

Today, for the first time in a year or perhaps more, we had a disk failure in one of our iSCSI backends (or at least we detected it today). As part of discovering that I was out of practice at dealing with this, I wound up having to hunt around to find our documentation on how iSCSI disk identifiers mapped to drive bays in our 16-bay Supermicro cases. This made me realize that probably we should do better at this.

The gold standard would be to label the actual drive sleds themselves with what iSCSI disk they are, but there are two problems with this. First, there's just not much good space on the front of each drive sled. Second and more severe, the actual assignment isn't tied to the HD's serial number or anything to do with the drive sled, but on what bay the drive sled is in. Labeling the drive sleds thus has the same problem as comments in code, and we must be absolutely certain that each drive sled is put in the proper bay or has its label removed or updated if it's moved. I'm not quite willing to completely believe this any more than I ever completely believe code comments, and that means we're always going to need to double check.

While we have documentation (as mentioned), what we don't have is a printed out version of that documentation in our machine room. The whole thing fits on a single printed page, so there are plenty of obvious places it could go; probably the best place is on the exposed side of one of the fileserver racks. The heart of the documentation is a little chart, so we could cut out a few copies and put them up on the front of some of the racks that hold the fileservers and iSCSI backends. That would make the full documentation accessible when we're in the machine room, and keep the most important part visible when we're in front of the iSCSI backends about to pull a disk.

This isn't the only bit of fileserver and iSCSI backend information that would be handy to have immediately accessible in paper form in the machine room, either. I think I have some thinking, some planning, and some printing out to do in my future.

(We used to keep printed documentation about some things in the machine room, but then time passed, it got out of date or irrelevant, and we removed the old racks that it had previously been attached to. And we're a lot less likely to need things like 'this is theoretically the order to turn things on after a total power failure' than more routine documentation like 'what disk maps to what drive bay'.)

Written on 06 July 2016.
« It turns out that viruses do try to conceal their ZIP files
Some notes on UID and GID remapping in the Illumos/OmniOS NFS server »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jul 6 22:51:43 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.