The challenge of what to set server BIOSes to do on power loss

June 16, 2021

Modern PC BIOSes, including server BIOSes, almost always have a setting for what the machine should do if the power is lost and then comes back. Generally your three options are 'stay powered off', 'turn on', and 'stay in your last state'. Lately I've been realizing that none of them are ideal in our current 'work from home' environment, and the general problem is probably unsolvable without internal remote power control.

In the normal course of events, what we want while working from home is for servers to stay in their last power state. If the power is lost and then comes back, running servers will power back up but servers that we've shut down to take out of service will stay off. If we set servers to 'always turn on', we would have to remember to take servers out of service by powering down their outlet on our smart PDU, not just telling them to halt and power off at the OS level. And of course if we had them set to 'stay powered off', we would have to go in to manually power them up.

But a power loss is not the only case where we might have to take servers down temporarily. We've had one or two scares with machine room air conditioning, and if we had a serious AC issue we would have to (remotely) turn machines off to reduce the heat load. If we turn machines off remotely from the OS level, the BIOS setting of 'stay in your last state' doesn't give us any straightforward way of turning them back on, even with a smart PDU; if we toggle outlet power at the smart PDU, the server BIOS will say 'well I was powered off before so I will stay powered off'. What we need to recover from this situation is what I called internal remote power control, where we can remotely command the machine to turn on.

Right now, if we had an AC issue we would probably have to remember to turn machines off through our smart PDUs instead of at the OS level. With our normal BIOS settings, this would let us remotely restart them through the smart PDU afterward. Since this is very different from our normal procedure for powering off machines, I can only hope that we'd remember to do it in the pressure of a serious AC issue.

(Smart PDUs have a few issues. First, not all of our machines are on them because we don't have enough of them and enough outlets. Second, when you power off a machine this way you're trusting your mapping between PDU ports and actual machines. We think our mapping is trustworthy, but we'd rather not find out the hard way.)


Comments on this page:

By Liam at UNC at 2021-06-16 09:47:48:

Do you not get Lights-Out Management on your servers? ILO, iDrac, IPMI etc?

By cks at 2021-06-16 11:31:59:

Most of our machines are Dells and we don't pay extra for the good version of their iDrac. As a result, we mostly haven't connected up our Dell IPMIs to the network. Even with a network connection, I think we'd generally need an IPMI client instead of being able to use a web browser or a SSH client to remotely power on the machine.

Possibly we should change this, especially with working from home, but there's a bit of a chicken and egg problem where we'd need to be in the office in order to wire things up (and bringing some of the IPMIs onto the network might need us to take a trip through the BIOS).

From 193.219.181.219 at 2021-06-17 05:10:20:

I think we'd generally need an IPMI client instead of being able to use a web browser or a SSH client to remotely power on the machine.

This is generally possible using all three methods – web interface, SSH, and IPMI (using ipmitool or ipmi-power). And also using iDRAC's proprietary XML/HTTP API via racadm serveraction.

(iDRAC's SSH interface is a bit strange, but it allows running racadm as an escape hatch.)

HP's iLO allows all of this (and serial-over-LAN via ipmi-console) even without a license. I'm not sure about iDRAC licensing (we only have one Dell server and it has an old iDRAC6 which was fully licensed when I got here), but from looking at the spec sheets I found online, "iDRAC Basic" gets you all of the above – only the graphical KVM features need an Enterprise license, just as with HP.

By Vitaly at 2021-06-18 08:19:48:

If we turn machines off remotely from the OS level, the BIOS setting of 'stay in your last state' doesn't give us any straightforward way of turning them back on

What about using wake-on-lan for such scenario?

With magic packets it's safe from accidental wakeups and is supported pretty much everywhere these days. You could also keep one low powered machine with "always on" policy to have an entry point into the network in an unlikely scenario of manually shutting down all the machines.

Written on 16 June 2021.
« Some notes on Firefox's media autoplay settings as of Firefox 89
In Prometheus queries, on and ignoring don't drop labels from the result »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jun 16 00:04:11 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.