UPSes: defense against problems, or sources of them?

May 29, 2010

Here is something that we have been forced to think about lately: are UPSes really a good insurance policy against power problems, or are they instead an extra source of problems? In short, does using UPSes really increase your net reliability?

The problem with UPSes used by themselves is that they are another piece of machinery to fail (and they are a moderately complicated piece of machinery at that). And UPSes do fail; for example, we recently had an incident where a UPS reset itself out of the blue, briefly dropping power to everything connected to it (and it was not a power overload situation).

(Even when they don't fail outright, UPS batteries eventually age into uselessness and must be replaced, which generally requires you to take the UPS out of service.)

So the real question is what the MTBF of UPSes is compared to the mean time between power failures. For us, the mean time between power failures seems to be very large and visibly larger than the MTBF of our UPSes; since we put our current crop of UPSes into production we have had no power failures and at least one UPS failure. At the moment this appears to make UPSes a net negative, in that we are more likely to have power problems caused by UPSes than by actual power loss.

The way around this is to arrange for the UPS not to be a critical path component, so that if it fails things don't go down. However, this takes extra hardware for every machine; you need dual power supplies or the equivalent, so that you can have the machine still getting power even if the UPS fails. This is generally somewhat expensive.

(You can apparently get external power units that give you dual power sources, so that you can protect even 1U servers, basic switches, and other things that don't normally have an option for dual power supplies.)

When you want to spend extra money, you wind up asking yourself how much extra uptime your money is buying you. If power failures are extremely rare the answer may well be 'not much'. Certainly this issue has given us some things to think about.

(Paying extra for genuine UPS insurance, dual power supplies and all, may be worth it if it lets you run machines in otherwise unsafe configurations for extra performance, for example having disk write caches turned on. But this probably turns it into a question of how much the extra performance is worth to you, not how much the reliability is.)


Comments on this page:

From 83.64.115.202 at 2010-05-30 06:05:28:

If you host your servers in a shared datacenter, you better buy servers and switches with dual power supplies (even 1U ones). Power outages (in the form of circuit (breaker) failures, UPS maintenance or plain power maintenance) will happen there, more often than not without you knowing.

We've learned this the medium hard way.

From 69.113.211.148 at 2010-05-31 12:10:40:

Relying on a single UPS is never a good idea. You should always plan for redundancy in your datacenter infrastructure. Most people get the network and power components of this right, but fail in other ways -- for example, when the chiller that supplies cold water to the datacenter is taken offline by the same power event. We run four smaller Liebert UPSes peppered in between our CRACs, which ensures we don't kill the entire datacenter for routine maintenance.

Regarding the devices that switch power from multiple sources for machines that can't have full circuit redundancy (either because they have only one power supply or too many power supplies), we had pretty good luck with Pulizzi (now owned by Eaton) units in our old datacenter. You might consider those.

My desktop system is another story. I can't tell you how many times I've kill-switched my whole computer by inadvertently hitting the button on the APC UPS under my desk with my foot.

--Jeff

From 216.16.239.66 at 2010-05-31 13:26:13:

This is a good question but is unanswerable without specific context.

In a data center with dual power sources and supplies likely UPSs have a smaller MTBF than your service. On the other hand anywhere using unfiltered power the MTBF is often much higher (generally 4-10 times a year where I have lived) than almost any UPS. Thus equipment like office servers, home (networking, servers, and telephony), and network paths (are you backing up your upstream switches, modems, switches?) which often need to be placed in remote locations.

Written on 29 May 2010.
« Some comments on spam scoring and anti-spam tools in general
The end of university-provided email is probably nigh »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat May 29 23:22:30 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.