Thinking about how long our infrastructure will last

November 18, 2010

Something that I've been thinking about recently is when we need to start thinking about turning over our fileserver infrastructure, or in the alternate way of thinking about it, how long we can make the infrastructure last. Since this could get very theoretical, I'm going to nail down a specific question: is it reasonable to assume our current infrastructure will last at least five years from our initial deployment? We don't seem likely to run into capacity limits, so one major aspect is the issue of how long the hardware and software will last for.

(Our initial deployment was about two years ago, depending on how you count things; we started production in September 2008 but only finished the migration from the old fileservers some time in early 2009.)

On the hardware front, our servers are not running into CPU constraints or other hardware limits that would push us towards replacing them with better machines. This leaves the lifetime of the mechanical parts in them (such as fans), and we have both spares and similar servers that have already been running for four years. So the odds are good. The SATA data disks in our backends are more problematic. They're under relatively active load and asking five years or more from consumer grade SATA drives may be a lot. While we have spares we don't have a complete replacement for all disks, which exposes us to a second order risk: long term technology changes.

SATA drives are not going away any time soon, but they seem likely to be changing a lot as vendors move to SATA drives with 4k sectors. It's possible that our current stack of software will not perform very well with such drives, given that other environments have already run into problems. If that happens we could be forced into software changes.

(I don't think 10G Ethernet is a risk here for reasons beyond the scope of this entry.)

On the software front, our software is both out of date and basically frozen (we have very little interest in changing a working environment). However, we aren't going to be able to do this forever; the likely triggers for forced major software changes would be the end of security updates for the frontends or significant hardware changes (such as 4k sector drives). Both are currently unknowns, but it seems at least possible that we could avoid problems for three more years.

(The backends run RHEL 5, which will have security updates through early 2014 as per here. The practical accessibility of Solaris 10 security updates for the frontends is currently quite uncertain, thanks to Oracle.)

One obvious conclusion here is that we should get a 4k sector SATA drive or two in order to test how well our current environment deals with such drives. That way we can at least be aware in advance, even if we aren't prepared.

Comments on this page:

From at 2010-11-18 04:30:33:

We have been using enterprise-grade SATA drives for a couple of years now. They are more expensive than standard drives (I'd say 50-100% more expensive for 1TB-2TB drives) but they are designed to be used 24h a day and they often come with a 5 year warranty.

By cks at 2010-11-18 11:46:48:

If our storage costs doubled, we would probably have to rethink our architecture; mirroring would not be anywhere near as attractive, for example.

From at 2010-11-19 05:31:42:

Well, in our case, the difference is more in the range of 15% (for an average server with 6 drives) to 35% (for a cheap RAID enclosure) of the initial hardware cost. I'm not sure how it impacts the total cost over the hardware lifetime (including the cost of buying spare drives later). I guess I should begin making some detailed stats. :-)

Written on 18 November 2010.
« When good ideas go bad: how not to do a file chooser dialog
My current Apache SSL cipher settings »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Nov 18 00:45:23 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.