Linux can be really stable under the right circumstances

October 5, 2016

We don't think about our iSCSI backends all that often. Really, we don't think about them at all. They're just kind of there, sitting quietly in racks and quietly working away. They haven't even sent in any SMART complaints about their data disks yet (although I'm sure that'll start happening in another year or two, unless we got really lucky or unlucky with these HDs).

Recently, though, we got email from the IPMI monitoring on one and as a result I wound up logging in to it. This caused me to notice just how long the production iSCSI backends have been up: from 557 days for the hot spare backend to 726 days for a pair used by one fileserver. As it turns out, this uptime is not arbitrary; it dates back to our forced switch from 10G to 1G networking, when we put 1G cards into everything in our fileserver infrastructure. They've been running untouched (and trouble-free) since then, faithfully handling what has undoubtedly been tens or hundreds of terabytes of IO by now.

Of course you can't get this kind of extreme stability if you change things like kernels so yeah, we haven't been. By now there's a whole collection of CentOS 7 updates that they don't have, which is okay (in our view) because these machines are appliances. We have them working and we have them locked down, and we like them just the way they are now. Based on our past experience with the previous generation of backends, they'll probably stay like this until they're decommissioned.

(This is really the rigid tradeoff of uptime; to get a high uptime, you can't touch things even when maybe you should. We shouldn't worship uptimes as a fetish; high uptimes are merely one means of achieving a goal, and avoiding reboots can sometimes cause problems. But for these machines, not touching them (including not rebooting them) is currently the easiest way to achieve our goal of an extremely stable fileserver environment.)

With all of that said, I have to admit that there's something in me that likes seeing large uptimes, especially on Linux machines. It's been a long time since I ran anything that normally got that sort of uptime and it's nice to see it once again, even if I know the cost of getting there.

(My workstations will never get that kind of uptime any more, because getting that kind of uptime requires being well behind the times. Two or three years is a long time in software releases of things that I like.)

Written on 05 October 2016.
« My take on Git rebasing versus cherry-picking
How we could update our iSCSI backends and why we probably won't »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Oct 5 00:56:45 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.