Our slow turnover of servers and server generations

August 14, 2022

We have long had a habit of upgrading machines between Ubuntu versions either every two years (for most machines that users log in to or directly use) or every four years (although the past two years are an exception). The every two year machines upgrade to every LTS version; the every four year machines upgrade every other LTS version, as their old LTS version threatens to fall out of support. The longer version of this is in How we handle Ubuntu LTS versions.

One part of this that I haven't mentioned before now is how this affects the rollout of new generations of the servers we use. Barring exceptional events, we don't change the physical hardware that a given version of a server is built with once it's in production. Instead, the server hardware only turns over when we reinstall machines from scratch (usually on a new Ubuntu version) or build completely new servers that have no existing version. This means that even important production machines can be running on what is now out of date hardware, because it was our most up to date hardware when they were built three or four years ago. Less important servers can be using even older hardware, if it was our 'previous generation' hardware when they were built three or four years ago using it.

Because we tend to buy hardware in bulk every so often, this often means that we buy a block of new server hardware at time X and then it may be a year or more before all of the new hardware is actually deployed. I think that all of our Dell R340s have now been deployed and we have no brand new in box ones sitting around, but we're certainly still working through our boxes of Dell R240s (which we bought toward the end of their availability).

(This is on my mind lately because I pulled two R240s out of their boxes last week to use for upgraded servers, along with reusing an R210 II for a third one.)

When new server generations introduce new useful capabilities, like dedicated BMC network ports, these capabilities can be slow to spread through our fleet and correspondingly slow to get used. Unless we really need or want a new capability for some server, it can take a while before we decide it's sufficiently wide spread to be investigated and put to use.

(All of which is to say that we're only now starting to default to connecting dedicated BMC networking ports to the network and configuring BMC networking. Until recently, dedicated BMC networking wasn't pervasive enough that we even thought about it.)

Written on 14 August 2022.
« The C free() API means memory allocation must save some metadata
Disk drive SMART attributes can go backward and otherwise be volatile »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Aug 14 22:48:45 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.