The importance of having full remote consoles on crucial servers

March 24, 2014

One of our fileservers locked up this evening for completely inexplicable reasons (possibly it had simply been up too long). These fileservers are still SunFire X2200s and I wound up diagnosing the problem and rebooting the server using the X2200's built in lights out management and remote console over IP functionality (often known as 'KVM over IP'). While I could have power cycled the machine without the ILOM (it's on a smart PDU that we can also control), having the KVM over IP available did two important things here. The first was that it let me establish that the machine was definitively hung and had not printed any useful messages to the console. The second was that I had very strong assurance that I could do almost anything possible to recover the machine if it didn't come up cleanly after the power cycle; not only did I have console access to Solaris, I would have console access to the GRUB boot menu and the BIOS if necessary (for example to force the boot drive).

I could have gotten some of that with a serial console, perhaps a fair amount of it if the BIOS also supported it. But let's be honest here; even with the BIOS's cooperation, a serial console is not as good and as complete as KVM over IP. And a serial console pretty much lacks the out of band management for things like forced power cycles and checking ILOM logs.

I've traditionally considered KVM over IP features to be a nice luxury but not really a necessity. After this incident I'm not sure I agree with that position any more. Certainly for many of our servers they're still not really essential; if one of our login or compute servers has problems, well, we have several of them. But for crucial core servers like fileservers, servers that we can't live without, I think it's a different matter. There we want to be able to do as much as possible remotely and for that KVM over IP is really important. Would I pay extra for it? I'd like to think that I'd now argue for that and say that it's worth some extra money per server (either for a server model that offers it or for license keys to enable it, depending on the server).

(I'd be happy to take KVM over IP on all of our servers but in our money constrained environment I don't think I'd pay extra for it on many of them.)

I'm now also very happy that our new fileserver hardware has full KVM over IP support for free. It wasn't a criteria when we were evaluating hardware so we got lucky here, but I'm glad that we did.

(And I've used our new hardware's SuperMicro KVM over IP and lights out management, so I can say that it works.)

By the way, my personal opinion is that the importance of KVM over IP goes up if your servers are not at your work but instead in a colocation facility or the like. Then any physical visit to the servers is a trek, instead of just out of hours visits. In an environment with actual ROI, it shouldn't take many sysadmin-hours spent on trips to the data center to equal the extra costs of KVM over IP capable hardware.

(I've written some praise for KVM over IP before, but back then I was focusing on (re)installs instead of disaster recovery because I hadn't yet had a situation like this happen to me.)

Written on 24 March 2014.
« Why I don't trust transitions to single-user mode
The DNS TTL problem »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Mar 24 23:19:50 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.