Thinking about how long our infrastructure will last
Something that I've been thinking about recently is when we need to start thinking about turning over our fileserver infrastructure, or in the alternate way of thinking about it, how long we can make the infrastructure last. Since this could get very theoretical, I'm going to nail down a specific question: is it reasonable to assume our current infrastructure will last at least five years from our initial deployment? We don't seem likely to run into capacity limits, so one major aspect is the issue of how long the hardware and software will last for.
(Our initial deployment was about two years ago, depending on how you count things; we started production in September 2008 but only finished the migration from the old fileservers some time in early 2009.)
On the hardware front, our servers are not running into CPU constraints or other hardware limits that would push us towards replacing them with better machines. This leaves the lifetime of the mechanical parts in them (such as fans), and we have both spares and similar servers that have already been running for four years. So the odds are good. The SATA data disks in our backends are more problematic. They're under relatively active load and asking five years or more from consumer grade SATA drives may be a lot. While we have spares we don't have a complete replacement for all disks, which exposes us to a second order risk: long term technology changes.
SATA drives are not going away any time soon, but they seem likely to be changing a lot as vendors move to SATA drives with 4k sectors. It's possible that our current stack of software will not perform very well with such drives, given that other environments have already run into problems. If that happens we could be forced into software changes.
(I don't think 10G Ethernet is a risk here for reasons beyond the scope of this entry.)
On the software front, our software is both out of date and basically frozen (we have very little interest in changing a working environment). However, we aren't going to be able to do this forever; the likely triggers for forced major software changes would be the end of security updates for the frontends or significant hardware changes (such as 4k sector drives). Both are currently unknowns, but it seems at least possible that we could avoid problems for three more years.
(The backends run RHEL 5, which will have security updates through early 2014 as per here. The practical accessibility of Solaris 10 security updates for the frontends is currently quite uncertain, thanks to Oracle.)
One obvious conclusion here is that we should get a 4k sector SATA drive or two in order to test how well our current environment deals with such drives. That way we can at least be aware in advance, even if we aren't prepared.