A lesson to myself: know your emergency contact numbers

June 19, 2016

Let's start with my tweets:

@thatcks: There's nothing quite like getting a weekend alert that a machine room we have network gear in is at 30C and climbing. Probably AC failure.

@thatcks: @isomer There is approximately nothing I can do, too. I'm not even sure who to potentially call, partly because it's not our machine room.

(This is the same machine room that got flooded because of an AC failure, which certainly added a degree of discomfort to the whole situation.)

In some organizations the answer here is 'go to the office and see about doing something, anything'. That is not how we work, for various reasons. It might be different if it was one of our main machine rooms, but an out of hours AC failure in a machine room we only have switches in is not a crisis sufficiently big to drag people to the office.

But, of course, there is a failure and a learning experience here, which is that I don't have any information written down about who to call to get the AC situation looked at by the university's Facilities and Services people. I've been through past machine room AC failures, and at the time I either read the signs we have on machine room doors or worked out (or heard) who to call to get it attended to, but I didn't write it down. Probably I thought that it was either obvious or surely I wouldn't forget it for next time around. Today I found out how well that went.

So, my lessons learned from this incident is that I should fix my ignorance problem once and for all. I should make a file with both in-hours and out-of-hours 'who to contact and/or notify' information for all of the machine rooms we're involved in. Probably we call the same people for a power failure as for an AC failure or another incident, but I should find out for sure and note this down too. Then I should replicate the file to at least my home machine, and probably keep a printout in the office (in case there's a failure in our main machine room, which would take our entire environment down).

(It would be sensible to also have contact information for, say, a failure in our campus backbone connection. I think I know who to try to call there, but I'm not sure and if it fails I won't exactly be able to look things up in the campus directory.)

Written on 19 June 2016.
« Why ZFS can't really allow you to add disks to raidz vdevs
A tiny systemd convenience: it can reboot the system from RAM alone »

Page tools: View Source.
Search:
Login: Password:

Last modified: Sun Jun 19 22:54:12 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.