One of our costs of using OmniOS was not having 10G networking

May 17, 2019

OmniOS has generally been pretty good to us over the lifetime of our second generation ZFS fileservers, but as we've migrated various filesystems from our OmniOS fileservers to our new Linux fileservers, it's become clear that one of the costs we paid for using OmniOS was not having 10G networking.

We certainly started out intending to have 10G networking on OmniOS; our hardware was capable of it, with Intel 10G-T chipsets, and OmniOS seemed happy to drive them at decent speeds. But early on we ran into a series of fatal problems with the Intel ixgbe driver which we never saw any fixes for. We moved our OmniOS machines (and our iSCSI backends) back to 1G, and they have stayed there ever since. When we made this move, we did not have detailed system metrics on things like NFS bandwidth usage by clients, and anyway almost all of our filesystems were on HDs, so 1G seemed like it should be fine. And indeed, we mostly didn't see obvious and glaring problems, especially right away.

What setting up a metrics system (even only on our NFS clients) and then later moving some filesystems from OmniOS (at 1G) to Linux (at 10G) made clear was that on some filesystems, we had definitely been hitting the 1G bandwidth limit and doing so had real impacts. The filesystem this was most visible on is the one that holds /var/mail, our central location for people's mailboxes (ie, their IMAP inbox). This was always on SSDs even on OmniOS, and once we started really looking it was clearly bottlenecked at 1G. It was one of the early filesystems we moved to the Linux fileservers, and the improvement was very visible. Our IMAP server, which has 10G itself, now routinely has bursts of over 200 Mbps inbound and sometimes sees brief periods of almost saturated network bandwidth. More importantly, the IMAP server's performance is visibly better; it is less loaded and more responsive, especially at busy times.

(A contributing factor to this is that any number of people have very big inboxes, and periodically our IMAP server winds up having to read through all of such an inbox. This creates a very asymmetric traffic pattern, with huge inbound bandwidth from the /var/mail fileserver to the IMAP server but very little outbound traffic.)

It's less clear how much of a cost we paid for HD-based filesystems, but it seems pretty likely that we paid some cost, especially since our OmniOS fileservers were relatively large (too large, in fact). With lots of filesystems, disks, and pools on each fileserver, it seems likely that there would have been periods where each fileserver could have reached inbound or outbound network bandwidth rates above 1G, if they'd had 10G networking.

(And this excludes backups, where it seems quite likely that 10G would have sped things up somewhat. I don't consider backups as important as regular fileserver NFS traffic because they're less time and latency sensitive.)

At the same time, it's quite possible that this cost was still worth paying in order to use OmniOS back then instead of one of the alternatives. ZFS on Linux was far less mature in 2013 and 2014, and I'm not sure how well FreeBSD would have worked, especially if we insisted on keeping a SAN based design with iSCSI.

(If we had had lots of money, we might have attempted to switch to other 10G networking cards, probably SFP+ ones instead of 10G-T (which would have required switch changes too), or to commission someone to fix up the ixgbe driver, or both. But with no funds for either, it was back to 1G for us and then the whole thing was one part of why we moved away from Illumos.)

Comments on this page:

By Peter Desnoyers at 2019-05-28 00:20:49:

Note that in 2013-2014 a single disk drive probably had a maximum speed of almost 200MB/s at the outer diameter, and averaged about 150MB/s across all LBAs, so for large files a single spindle could probably saturate 1Gbit/s.

Peter Desnoyers

Written on 17 May 2019.
« Go has no type for types in the language
My new favorite tool for looking at TLS things is certigo »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri May 17 01:11:05 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.