The limits of open source with Illumos and OmniOS

March 27, 2016

I go back and forth on how optimistic I feel about OmniOS and Illumos as a whole. During the up moods, I remember how our fileservers are problem free these days; during the down moods, I remember our outstanding problems. This is an entry written from a down mood perspective.

At this point we have several outstanding problems with OmniOS and Illumos as a whole, such as our ixgbe 10G Ethernet issues and the kernel holding memory. These issues have been officially known for some time, but they remain and as far as I can tell there's been no visible movement towards fixing them. At the same time we have seen other problems be dealt with quite rapidly.

What I read into this is that we have hit the limits of Illumos's open source development. The things that I've seen dealt with promptly are either small, already solved somewhere, or a priority of some paying customer of an Illumos-related company. Our open issues are big and gnarly and (apparently) not being pushed along by anyone who can afford to pay for support; revising bits of the kernel memory system or doing a major update of the ixgbe driver are both not small projects, after all.

In a bigger open source project such as Linux, there is both more manpower available and more people running into relatively obscure problems such as these. As an example, Linux is popular enough that it's extremely unlikely that a major 10G Ethernet driver would be left to rot in an effectively unusable condition for common hardware. But Illumos simply does not have that kind of manpower and usage; what gets developed and fixed for Illumos is clearly much more narrow. The people working on Illumos are great and they have been super-helpful to us where they could, but the limits of where they can be helpful do not extend to doing major unpaid work. And this means that what we can expect from Illumos and OmniOS is limited.

How limited? In my down mood right now, I say that in practice we can expect to get something very close to no support. If something doesn't work, we get to keep all the pieces and (as with our 10G situation) we cannot expect a fix over the lifetime of our fileservers.

(This is the theoretical situation with Linux and FreeBSD until we, say, pay Red Hat for good RHEL support, but not the practical one.)

This makes me think that as nice as OmniOS is on our current fileservers, I won't really be able to recommend it as the OS for our next generation of fileservers in a few years. This is beyond the concrete issues I wrote about in the future of OmniOS here without 10G (or when I initially worried about driver support); it's a general issue of how much confidence I can have about being able to get problems fixed.

(I'm sure that if we had the money for support or consulting work we'd get great support from OmniTI and so on, and we'd probably have fixes for our problems. But we don't have that money and are unlikely to ever do so, so we must rely on the charity of the crowd. And the Illumos crowd is thin.)

PS: Some people might say 'just test the 2018 version of OmniOS a lot before you make the final decision'. Unfortunately, our experiences with 10G ixgbe and other issues make it clear that we simply can't do that well enough. We will experience problems in production that we couldn't find before then.


Comments on this page:

By Anon at 2016-03-27 05:32:30:

This type of issue can definitely happen on Linux too - there are cases where laptop issues go unresolved (look at NVIDIA open source drivers) because the people with the problem aren't also the people who can fix the problem and those who might be able to are overcommitted already. Out of interested did you ever have to pay Sun for Solaris licenses or support back in the day?

This is the problem with not having deep enough pockets and not having the magical skills to fix it yourself; you can't put an appropriate incentive in front of people to solve the issue for you and you're not in a position to fix it yourself (which says a lot about the probability of self fixing open source if you happen to be working for a Computer Science department).

By Alan at 2016-03-27 05:45:10:

Nvidia open source drivers aren't a great example. W.r.t to the people that do pay for Linux (or directly employ developers, like Google), issues like Chris' are more likely to be a problem for them.

I've been wondering if you were planning on taking a spare fileserver and installing a Linux with zfsonlinux support, then testing that as a replacement. Presumably you could observe issues now and report them upstream well before you actually had to commit to changing over as a result of dead hardware or other obsolescence.

That reminds me -- do you have time cohorts of machines in service? That is, if you had a need for twenty fileservers, do you buy 24 machines in two batches of 12 (or three of 8), so that you have less risk of multiple simultaneous failures, or do you buy as many as you have budget for immediately and hope that doesn't happen?

By Kevin Bowling at 2016-03-28 03:22:51:

This is amusing because it seems like your main beef is Intel drivers, which suck on Linux and FreeBSD too thanks to systemic problems inside Intel. Buy Chelsio cards if you want a better time on Illumos and FreeBSD. I'd recommend the same or SolarFlare on Linux.

Written on 27 March 2016.
« The sensible update for my vintage 2011 home machine
Why I don't think upgrading servers would save us much power »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Mar 27 01:04:34 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.