How we could update our iSCSI backends and why we probably won't

October 6, 2016

I mentioned yesterday that we hadn't applied any updates to the iSCSI backends for our fileservers since we installed them. This is probably not an ideal situation for various reasons, missing security updates being one of them, and maybe we should work on changing it. The two problems with any real updates is that we don't want to disturb our fileservice (it's in use all the time) and we don't want to destabilize it. In theory updates shouldn't destabilize things because they should just be bug fixes and so on; in practice, well, not necessarily.

(Since the iSCSI backends are so simple and basically only use the kernel, the risk of instability is relatively low. But we'd likely want to update the kernel to the current CentOS 7 version, so there is some risk.)

Since we have entire spare iSCSI backends, we can in theory reduce the risk and eliminate almost all disruption with some extra work. Well, mostly, because there is a gotcha in our environment at the end of the process, but I'll get to that later. Given a spare backend, the obvious approach is to upgrade the spare backend, use the spare backend to make all of the two-way mirrored vdevs on a fileserver into three-way mirrors, run it for a while under production load to prove it, and then pull each regular backend out, upgrade it, and put it back in. At the end we'd pull the spare backend out and be back to the usual setup but with upgraded backends.

(In theory we could run into trouble because three-way mirrors amplify writes compared to two-way mirrors. In practice we don't see enough write traffic to saturate the network links, so adding more write traffic is unlikely to hurt.)

Unfortunately we have one fileserver with two special backends that we can't do this for, because they're all-SSD backends. We don't have enough additional SSDs to build a third backend, and of course the pools that are on these SSDs are the pools that are the most important and most demanding of IO (and the SSDs are a different size from our HDs). Even a short downtime to swap in new system disks on one or both backends would be disruptive (and potentially dangerous).

All of which brings me around to the practical reasons against doing this. To wit, we definitely don't want to run one set of backends on different package versions than everything else (ie, non-upgraded SSD backends versus upgraded every other backends). We probably don't want to do a disruptive SSD backend upgrade unless we're clearly getting something important from it, not just 'well, let's get up to date on all the CentOS updates'. And even going through upgrades on the regular HD backends would involve weeks of pools resilvering over and over and over again as we moved disks in and out of mirror sets, and repeated resilvering has its own impact on fileserver performance.

It would be nice to have up to date packages on the iSCSI backends (and on the fileservers themselves, for that matter). But it's very hard for me to argue that it's compelling enough to be worth the work and disruption involved.

(All of this leads me to have thoughts on how OmniOS boot environments don't really solve our upgrade problems on the OmniOS side, but that's a topic for another blog entry.)

Written on 06 October 2016.
« Linux can be really stable under the right circumstances
Why OmniOS boot environments don't solve our upgrade issues »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Oct 6 00:08:34 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.