2017-12-23
Our next generation of fileservers will not be based on Illumos
Our current generation of ZFS NFS fileservers are based on OmniOS. We've slowly been working on the design of our next generation for the past few months, and one of the decisions we've made is that unless something really unusual happens, we won't be using any form of Illumos as the base operating system. While we're going to continue using ZFS, we'll be basing our fileservers on either ZFS on Linux or FreeBSD (preferably ZoL, because we already run lots of Linux machines and we don't have any FreeBSD ones).
This is not directly because of uncertainties around OmniOS CE's future (or the then lack of a LTS release that I wrote about here, because it now has one). There is really no single cause that could change our minds if it was fixed or changed; instead there are multiple contributing factors. Ultimately we made our decision because we are not in love with OmniOS and we no longer think we need to run it in order to get what we really want, which is ZFS with solid NFS fileservice.
However, I feel I need to mention some major contributing factors. The largest single factor is our continued lack of confidence in Illumos's support for Intel 10G-T chipsets. As far as I can tell from the master Illumos source, nothing substantial has changed here since back in 2014, and certainly I don't consider it a good sign that the ixgbe driver still does kernel busy-waits for milliseconds at a time. We consider 10G-T absolutely essential for our next generation of fileservers and we don't want to take chances.
(If you want to see how those busy-waits happens, look at the
definition of msec_delay
in ixgbe_osdep.h.
drv_usecwait
is specifically defined to busy-wait; it's designed
to be used for microsecond durations, not millisecond ones.)
Another significant contributing factor is our frustrations with OmniOS's KYSTY minimalism, which makes dealing with our OmniOS machines more painful than dealing with our Linux ones (even the Linux ones that aren't Ubuntu based). And yes, having differently named commands does matter. It's possible that another Illumos based distribution could do better here, but I don't think there's a better one for our needs and it would still leave us with our broad issues with Illumos.
It's undeniable that we have more confidence in Linux on the whole than we do in Illumos. Linux is far more widely and heavily used, generally supports more hardware (and does so more promptly), and we've already seen that Intel 10G-T cards work fine in it (we have them in a number of our existing Linux machines, where they run great). Basically the only risk area is ZFS on Linux, and we have FreeBSD as a fallback.
There are some aspects of OmniOS that I will definitely miss, most notably DTrace. Modern Linux may have more or less functional equivalents, but I don't think there's anything that's half as usable. However on the whole I have no sentimental attachments to Solaris or Illumos; I don't hate it, but I won't miss it on the whole and an all-Linux environment will make my life simpler.
(This decision is only partly related to our decision not to use a SAN in the next generation of fileservers. While we could probably use OmniOS with the local disk setup that we want, not having to worry about Illumos's hardware support for various controller hardware does make our lives simpler.)