A hazard of our old version of OmniOS: sometimes powering off doesn't
Two weeks ago, I powered down all of our OmniOS fileservers that
are now out of production, which is
most of them. By that, I mean that I logged in to each of them via
SSH and ran '
poweroff'. The machines disappeared from the network
and I thought nothing more of it.
This Sunday morning we had a brief power failure. In the aftermath of the power failure, three out of four of the OmniOS fileservers reappeared on the network, which we knew mostly because they sent us some email (there were no bad effects of them coming back). When I noticed them back, I assumed that this had happened because we'd set their BIOSes to 'always power on after a power failure'. This is not too crazy a setting for a production server you want up at all costs because it's a central fileserver, but it's obviously no longer the setting you want once they go out of production.
Today, I logged in to the three that had come back, ran '
on them again, and then later went down to the machine room to pull
out their power cords. To my surprise, when I looked at the physical
machines, they had little green power lights that claimed they were
powered on. When I plugged in a roving display and keyboard to check
their state, I discovered that all three were still powered on and
sitting displaying an OmniOS console message to the effect that they
were powering off. Well, they might have been trying to power off,
but they weren't achieving it.
I rather suspect that this is what happened two weeks ago, and why
these machines all sprang back to life after the power failure. If
OmniOS never actually powered the machines off, even a BIOS setting
of 'resume last power state after a power failure' would have powered
the machines on again, which would have booted OmniOS back up again.
Two weeks ago, I didn't go look at the physical servers or check
their power state through their lights out management interface;
it never occurred to me that '
poweroff' on OmniOS sometimes might
not actually power the machine off, especially when the machines
did drop off the network.
(One out of the four OmniOS servers didn't spring back to life after the power failure, and was powered off when I looked at the hardware. Perhaps its BIOS was set very differently, or perhaps OmniOS managed to actually power it off. They're all the same hardware and the same OmniOS version, but the server that probably managed to power off had no active ZFS pools on our iSCSI backends; the other three did.)
At this point, this is only a curiosity. If all goes well, the last OmniOS fileserver will go out of production tomorrow evening. It's being turned off as part of that, which means that I'm going to have to check that it actually powered off (and I'd better add that to the checklist I've written up).