I've changed my thinking about redundant power supplies

August 22, 2013

Back almost at the start of Wandering Thoughts, I wrote an entry in which I was pretty negative on redundant power supplies. Since I'm busy specifying redundant power supplies for our new generation of fileserver hardware, I think it's about time I admitted something: now that I'm older and somewhat wiser, I'm changing my mind. Redundant power supplies can be quite worth it. In fact I was at least partially wrong back then.

(In my defense, at the time I had very little experience with decent server hardware for reasons that do not fit in the margins of this entry but boil down to 'hardware budget? what's that?'. In retrospect this shows quite vividly in parts of that old entry.)

It's still true that in theory there are plenty of bits of hardware that can break in your server (and the power supplies in our servers have been very reliable). But in practice we've suffered several power supply failures (especially in our backend disk enclosures) and they are probably either the first or second most common cause of hardware failures around here. Apart from the spinning rust of system drives, those other bits of fragile hardware almost never have failed for us.

(Also, an increasing amount of server hardware effectively has some amount of redundancy for the other breakage-prone parts. For example, the whole system (CPUs included) may be passively cooled through multi-fan airflow; if one fan fails, alarms go off but there's enough remaining airflow and cooling that the system doesn't die.)

There's also an important second thing that redundant power supplies enable for crucial servers: they let you deal easily with various sorts of UPS issues (as I noted in that entry). As we both want UPSes and have had UPS problems in the past, this is an important issue for us. We have a solution now but it adds an extra point of failure; redundant power supplies would let us get rid of it.

There is also a pragmatic side of this. In practice hardware with redundant hot swappable power supplies is almost always simply better built in general (power supplies included). Part of our disk enclosure power supply problems likely come from the fact that the power supplies are generic PC power supplies that have had to power 12 disks on a continuous basis for years. Given our much better experience with server power supplies it seems likely that a better grade of power supply would improve things in general.

(Part of this is probably just that hot-swap server power supplies are less generic and thus more engineered than baseline PC power supplies.)

I'm now all for redundant power supplies in sufficiently important servers. However I'm still not sure that I'd put redundant power supplies into most of our servers unless I got them essentially for free; many of our server are not quite that important and for some we already have server-level redundancy.


Comments on this page:

From 203.176.102.130 at 2013-08-22 22:55:01:

Redundant power supplies should be a mandatory for new servers just like Redundant Hard drives.

By using Redundant power supplies you can have a dual feed into the server from two different power sources. So one UPS goes down, the server stays up.

If you have ever had to speak with technical support and request a spare part and not have it arrive. (please swap the Ram/CPU/Motherboard/case to help us diagnose that it is the power supply issue before we replace.... YES NO JOKE A COLLEAGUE was asked that). You know that it is just not worth the trouble.

The only exception to this would be your distributed application. If you have 12 servers and only say 8 are needed at a time you could just accept that 1/2 your server farm would fail and still be reasonably ok (naturally rebuilding what has failed.). Having said that you want to know that the 12 or so servers are on different power feeds otherwise during a grey out the entire service can go down.

All in all, it's worth the extra $1xx dollars for the Hard drives and redundant power supplies. CPU's maybe not but HDD and power supplies just seem to fail. I would also recommend a spare tape drive incase you need to restore from backup and cannot locate that drive model!

my2c

From 99.27.42.81 at 2013-08-23 19:59:05:

As someone who has qual'ed and spec'ed redundant power supplies for server hardware (as an engineer designing server hardware) in the past, I can assure you that your assertion that they're engineered to higher quality standards than generic PC power supplies is incorrect. I found all the normal issues with generic PC power supplies also applies to redundant power supply modules -- unstable outputs, failure to actually handle the rated amperage, early demise, etc. There is nothing more discouraging than to have a redundant power supply let out its magic smoke, then its peer lets out its own magic smoke because it can't handle the load.

The upside is that you can test for this fairly easily (before you put it in production, yank plugs and see if magic smoke comes out), and if you buy premium redundant power supplies rather than the cheapest, they're as good as the premium generic PC power supplies. But for our own racks I will state another use for redundant power supplies -- each side is plugged into a different 20 amp circuit so if a breaker blows or there is otherwise a power issue on one circuit, the rack stays up (well, assuming you have network switches with redundant power supplies, sigh!). It makes me sleep better at night knowing that a power issue short of the whole building going dark isn't going to take out our infrastructure. And who doesn't want to sleep better?

-Eric

Written on 22 August 2013.
« Disk enclosures versus 'all in one case' designs
Looking at how many viruses we've seen in email recently »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Aug 22 00:22:01 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.