Our current usage and views of UPSes (late 2020 edition)

October 11, 2020

Over the time I've been here, we've had rather mixed experiences with UPSes. Initially we used them relatively freely on machines that we felt were important, such as our first generation ZFS fileservers. Unfortunately, after a while the UPSes themselves caused us problems (for example, a spontaneous UPS reset that power cycled the machine attached to it), and we grew much more disenchanted with them.

These days we still use UPSes on some machines, such as our fileservers, but pretty much only on machines with redundant power supplies and good IPMIs. One power supply goes to the UPS and the other power supply to line power (via a rack PDU), and we've done as much as possible to make it so that if one power supply sees a power failure, we'll get notified about it. Most of our servers have only one power supply, so we don't even consider putting them on a UPS (even with an automatic transfer switch, of which we have some from days past).

(We don't currently try to monitor or talk to our currently in use UPSes, but we should at least look into that. Probably they're capable of it, since they're decent quality rack mounted units.)

Today, this is the only configuration that I feel comfortable with for production use, because it doesn't add any new single points of failure but will probably give us protection from power loss (assuming the UPS is working properly; if it's not, we're no worse off if we lose main power).

We've historically not bought many servers with redundant power supplies, but that's starting to change now that we're working remotely and fixing a server with a dead PSU is much slower and more work. This may push us toward more use of UPSes, although that's not as useful as it looks because our network switches generally only have one power supply and so won't be on a UPS.

(The other issue with putting switches on a UPS is that right now it would cut off our ability to power cycle them remotely, since they generally don't have a full equivalent of an IPMI to enable internal remote power control. There are various hacks possible here, though.)

Some UPSes stop working if their batteries are unhealthy or dead, which is not something we want to have happen with our UPSes. There are multiple ways to implement a UPS, called topologies, that you can find described in eg Wikipedia, but I don't know if any of them actually require this or if your UPS stopping if the battery is dead is just a choice that UPS vendors make. Our current practice is to replace UPS batteries on a reasonably regular basis, partly to avoid any unpleasant surprises like this.

(Once we're actually talking to our UPSes, hopefully they will tell us about this sort of thing.)

Written on 11 October 2020.
« Wanting to be able to monitor for electrical power quality issues
Microsoft SharePoint is being used to send spam »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Oct 11 00:40:37 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.