Wandering Thoughts archives

2016-05-28

A problem with using old OmniOS versions: disconnection from the community

One of the less obvious problems with us probably never doing another OmniOS upgrade is that I'm clearly going to become more and more disconnected from the OmniOS community. This is only natural, since most or almost all of the community is using recent versions; as time goes on, those versions and the version we're running are only going to drift more and more apart.

(It's true that OmniOS r151014 is an OmniOS LTS release, supported through early 2018 per here. But in practice I expect that most OmniOS people will be running the one of the more up to date stable releases instead, since they won't have our upgrade concerns.)

Being disconnected from the community makes me sad, because the OmniOS community is one of the great parts about OmniOS. There are several dimensions to this disconnection. First, the more disconnected I am from the community, the less I'll be able to give back to it, the less I can contribute answers or information or whatever. Giving back to the community is something that I would like to do for all sorts of reasons (including that I plain like being able to contribute).

Obviously, the more distant we are from what the community is running the less the community can help us with advice and information and all of that if we run into issues or just have questions about how best to do something or what the community's experiences are. At best they may be able to tell us how things would look or would be done on a newer version of OmniOS. Of course, some things only change slowly, but I suspect that there is only going to be more and more of a gap here over time. I don't want to put too much weight on this; I'm very grateful to the help that the community has given us, but at the same time it's not help that I think we should count on and significantly factor into our plans.

(To put it one way, community help comes from the goodness of its heart and is best considered a pleasant surprise instead of a guarantee or an entitlement. I don't know if all of this makes sense to anyone but me, though.)

Finally, I'll just plain be paying less attention to the community and drifting away it. It's inevitable; more and more, community discussions will be about things that aren't relevant to our version and that I can't contribute to. If people have problems or questions, I'll only have outdated information or more and more uninformed opinions. That's a recipe for disengagement, even from a nice community.

Having written all of this, I think that what I should do is build one experimental OmniOS server to keep up to date. It doesn't have to use our fileserver hardware; for a lot of things, any old server running OmniOS will serve to keep me at least somewhat current. As a bonus it will provide me with a platform to test things on the current OmniOS version (whatever that is at the time).

(We have enough spare SSDs for our current fileservers so that I could take the test fileserver and build a system SSD set for the current OmniOS, just so I have it around. We did this sort of back and forth OmniOS version testing during our transition to r151014, so we actually have a template for it.)

OmniOSCommunityDisconnect written at 00:35:28; Add Comment

2016-05-22

Our problem with OmniOS upgrades: we'll probably never do any more

Our current fileserver infrastructure is currently running OmniOS r151014, and I have recently crystallized the realization that we will probably not upgrade it to a newer version of OmniOS over the remaining lifetime of this generation of server hardware (which I optimistically project to be another two to three years). This is kind of a problem for a number of reasons (and yes, beyond the obvious), but my pessimistic view right now is that it's an essentially intractable one for us.

The core issue with upgrades for us is that in practice they are extremely risky. Our fileservers are a core and highly visible service in our environment; downtime or problems on even a single production fileserver directly impacts the ability of people here to get their work done. And we can't even come close to completely testing a new fileserver outside of production. Over and over, we have only found problems (sometimes serious ones) under our real and highly unpredictable production load.

(We can do plenty of fileserver testing outside of production and we do, but testing can't show that production fileservers will be problem free, it can only find (some) problems before production.)

Since upgrades are risky, we need fairly strong reasons to do them. When our existing fileservers are working reasonably well, it's not clear where such strong reasons would come from (barring a few freak events, like a major ixgbe improvement, or the discovery of catastrophic bugs in ZFS or NFS service or the like). On the one hand this is a testimony to OmniOS's current usefulness, but on the other hand, well.

I don't have any answers to this. There probably really aren't any, and I'm wishing for a magic solution to my problems. Sometimes that's just how it goes.

(I'm assuming for the moment that we could do OmniOS version upgrades through new boot environments. We might not be able to, for various reasons (we couldn't last time), in which case the upgrade problem gets worse. Actual system reinstalls, hardware swaps, or other long-downtime operations crank the difficulty of selling upgrades up even more. Our round of upgrades to OmniOS r151014 took about six months from the first server to the last server, for a whole collection of reasons including not wanting to do all servers at once in case of problems.)

OmniOSOurUpgradeProblem written at 23:55:51; Add Comment

By day for May 2016: 22 28; before May; after May.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.