The limits of open source with Illumos and OmniOS
I go back and forth on how optimistic I feel about OmniOS and Illumos as a whole. During the up moods, I remember how our fileservers are problem free these days; during the down moods, I remember our outstanding problems. This is an entry written from a down mood perspective.
At this point we have several outstanding problems with OmniOS and Illumos as a whole, such as our ixgbe 10G Ethernet issues and the kernel holding memory. These issues have been officially known for some time, but they remain and as far as I can tell there's been no visible movement towards fixing them. At the same time we have seen other problems be dealt with quite rapidly.
What I read into this is that we have hit the limits of Illumos's
open source development. The things that I've seen dealt with
promptly are either small, already solved somewhere, or a priority
of some paying customer of an Illumos-related company. Our open
issues are big and gnarly and (apparently) not being pushed along
by anyone who can afford to pay for support; revising bits of the
kernel memory system or doing a major update of the
are both not small projects, after all.
In a bigger open source project such as Linux, there is both more manpower available and more people running into relatively obscure problems such as these. As an example, Linux is popular enough that it's extremely unlikely that a major 10G Ethernet driver would be left to rot in an effectively unusable condition for common hardware. But Illumos simply does not have that kind of manpower and usage; what gets developed and fixed for Illumos is clearly much more narrow. The people working on Illumos are great and they have been super-helpful to us where they could, but the limits of where they can be helpful do not extend to doing major unpaid work. And this means that what we can expect from Illumos and OmniOS is limited.
How limited? In my down mood right now, I say that in practice we can expect to get something very close to no support. If something doesn't work, we get to keep all the pieces and (as with our 10G situation) we cannot expect a fix over the lifetime of our fileservers.
(This is the theoretical situation with Linux and FreeBSD until we, say, pay Red Hat for good RHEL support, but not the practical one.)
This makes me think that as nice as OmniOS is on our current fileservers, I won't really be able to recommend it as the OS for our next generation of fileservers in a few years. This is beyond the concrete issues I wrote about in the future of OmniOS here without 10G (or when I initially worried about driver support); it's a general issue of how much confidence I can have about being able to get problems fixed.
(I'm sure that if we had the money for support or consulting work we'd get great support from OmniTI and so on, and we'd probably have fixes for our problems. But we don't have that money and are unlikely to ever do so, so we must rely on the charity of the crowd. And the Illumos crowd is thin.)
PS: Some people might say 'just test the 2018 version of OmniOS a lot before you make the final decision'. Unfortunately, our experiences with 10G ixgbe and other issues make it clear that we simply can't do that well enough. We will experience problems in production that we couldn't find before then.