2016-03-25
There's a relationship between server utilization and server lifetime
In yesterday's entry on how old our servers are, I said that one reason we can keep servers for so long is that our CPU needs are modest, so old servers are perfectly fine. There is another way to put this, namely that almost all of our servers have low CPU utilization. It is this low CPU utilization that makes it practical and even sensible to keep using old, slow servers. Similarly, almost all of our servers have low RAM utilization, low disk IO utilization, and so on.
This leads me to the undoubtedly unoriginal observation that there's an obvious relationship between capacity utilization and how much pressure there is to upgrade your servers. If your servers are at low utilization in all of their dimensions (CPU, RAM, IO bandwidth, network bandwidth, etc), any old machine will do and there is little need to upgrade servers. But the more your servers are approaching their capacity limits, the more potential benefit there is of upgrading to new servers that will let you do more with them (especially if you have other constraints that make it hard to just add more servers, like power or space limits or a need for more capacity in a single machine).
It follows that anything that increases server utilization can drive upgrades. For example, containers and other forms of virtualization; these can take N separate low-utilization services, embody them all on the same physical server, and wind up collectively utilizing much more of the hardware.
(And the inverse is true; if you eschew containers et al, as we do, you're probably going to have a lot of services with low machine utilizations that can thus live happily on old servers as long as the hardware stays reliable.)
In one way all of this seems obvious: of course you put demanding services on new hardware because it's the fastest and best you have. But I think there's a use for taking a system utilization perspective, not just a service performance one; certainly it's a perspective on effective server lifetimes that hadn't really occurred to me before now.
(Concretely, if I see a heavily used server with high overall utilization, it's probably a server with a comparatively short lifetime.)