The future of OmniOS here if we can't get 10G-T working on it

December 22, 2014

When I wrote about our long road to getting 10G in production on OmniOS after our problems with it, I mentioned in an aside that the pessimistic version of when we might get our new fileserver environment back to 10G was 'never' and that that would have depressing consequences. Today I've decided to talk about them.

From the start, one of my concerns with Illumos has been hardware support. A failure to get our OmniOS fileservers back to 10G-T would almost certainly be a failure of hardware support, where either the ixgbe driver didn't get updated or the update didn't work well enough. It would also be specifically a failure to support 10G. Both have significant impacts on the future.

We can, I think, survive this generation of fileservers without 10G, although it will hurt (partly because it makes 10G much less useful in other parts of our infrastructure and partly because we spent a bunch of money on 10G hardware). I don't think we can survive the next generation without 10G; in four years 10G-T will likely be much more pervasive and I'm certainly hoping that big SSDs will be cheap enough that they'll become our primary storage. SSDs over 1G networking is, well, not really all that attractive; once you have SSD data rates, you really want better than 1G.

That basically means the next generation of fileservers could not be OmniOS (maybe unless we do something really crazy); we would have to move to something we felt would give us 10G and the good hardware support we hadn't gotten from Illumos. The possibility of going to a non-Illumos system in four years obviously drains off some amount of interest in investing lots of time in OmniOS now, because there would be relatively little long term payoff from that time. The more we think OmniOS is not going to be used in the next generation, the more we'd switch to running OmniOS purely in maintenance mode.

To some extent all of this kicks into play even if we can move OmniOS back to 10G but just not very fast. If it takes a year or two for OmniOS to get an ixgbe update, sure, it's nice to be running 10G-T for the remainder of the production lifetime of these fileservers, but it's not a good omen for the next generation because we'd certainly like more timely hardware support than that.

(And on bad omens for general hardware support, well, our version of OmnioOS doesn't even seem to support the 1G Broadcom ports on our Dell R210 servers.)

Sidebar: I'm not sure if Illumos needs more development in general

I was going to say that lagging hardware support could also be a bad omen for the pace of Illumos development in general, but I'm actually not sure if Illumos actually needs general development (from our perspective). Right now I'm assuming that the great 'ZFS block pointer rewrite' feature will never happen, and I'm honestly not sure if there's much other improvements we'd really care very much about. DTrace, the NFS server, and the iSCSI initiator do seem to work fine, I no longer expect ZFS to get any sort of API, and I don't think ZFS is missing any features that we care about very much (and we haven't particularly tripped over any bugs).

(ZFS is also the most likely thing to get further development attention and bugfixes, because it's currently one of the few big killer features of Illumos for many people.)


Comments on this page:

From 62.195.238.236 at 2014-12-23 01:44:31:

you could talk to OmniTI to try and get that driver working with a support contract. Win for you, win for the community.

Obviously, you think that OmniOS/Illumos is a good tool for the job at hand. Have you considered trying to improve the hardware situation whichever way your are comfortable with? I agree that the Illumos hardware support is rather limited, but like in most open source projects, shortcomings are generally due to no one being interested in addressing whichever shortcoming here and now. If you can convince someone to care about something, they'll fix the shortcomings. There are different ways for making people interested - you could pay someone to re-sync the Intel driver, or you could try to convince the OmniOS devs (I think this would result in priority reshuffling but not an immediate action), you could even try to update the driver yourself (I realize that you're not a kernel dev).

Anyway, just taking about the worst case on your blog may not be helpful as far as getting your hardware to work - regardless of how valuable it is to consider the various scenarios for you as a system in.

I'd like Illumos to have a future...consider what would have happened if people have up on Linux back in the 90s because a driver didn't work.

With all this said , I understand your job is to make the systems run.

N.B. I'm an Illumos developer.

By dozzie at 2014-12-23 10:16:12:

@Joseph: So, how making OmniOS/Illumos developers to actually support 10G NICs (or at least one model) is going to help cks' confidence about OmniOS/Illumos hardware support in the future? Because this is what the entry is about: cks is starting to worry if hardware support is good enough for his environment.

Convincing developers to do something is pretty much equal to filing a bug/feature request, paying is not that attractive option (mentioned in some recent post), and hacking it by cks himself was mentioned in this post.

By cks at 2014-12-23 15:23:57:

As I mentioned in the entry about our long road (back) to 10G, the blocking issue with paying for work on the ixgbe driver is a current lack of confidence that doing so would actually make it work for us. Without a problem reproduction we won't know either if the problem is fixed in any new setup or if the current Linux version of the driver is problem-free for us, so we'd be rolling the dice both on burning a chunk of our limited budget on this and then deploying it in production.

As for the mechanics, I think we'd have to do it in some way other than a straight up support contract with OmniTI because at their listed prices we can't afford to buy support for all of our production fileservers. A few thousand dollars as a one-time cost is in the outside range of what our budget can probably support.

(In a commercial environment this would probably be absurd, but we are not in a commercial environment and at a university there is effectively no ROI on such spending.)

By Anonymous at 2014-12-30 07:07:27:

So I understand you are running a single socket servers. OmniTI support cost is $2,500 per server for up-to 10 servers, and $1000 per server for 11+ servers. Plus they offer 20% discount for educational institutions.

Solaris 11 is $1000 per server, regardless of how many you need. Is $1k really that much?

Then Solaris 11 has updated drivers, many sites are running it with 10G just fine, not to mentioned that Oracle x86 servers have Intel 10G Base-T as their on-boards and are really well supported.

Another option is to buy a support for one fileserver, either Solaris 11 or OmniTI, and try to get your issue fixed that way.

Although I doubt you will get the issue on Solaris 11 - it's most likely already been fixed there.

IMHO Solaris 11 x86 has much better HW support than Illumos, not to mention continued development in it.

By cks at 2014-12-30 15:40:02:

There's a number of reasons why Solaris 11 isn't attractive to us beyond just the cost; I wrote them up in StrikesAgainstSolaris11. The lack of source code along (and how this cripples DTrace, which has been crucial for us in diagnosing problems) is basically fatal.

From 86.167.138.158 at 2014-12-31 09:03:14:

Lack of source code is often an issue, but frankly not that big one and it definitely doesn't make DTrace useless. Then you also have more DTrace providers than in older Solaris releases, which makes access to source code unnecessary in many cases and makes using DTrace actually easier. But yes, I agree that lack of the source code makes it harder sometimes, but not impossible.

For example in Solaris 11 you have iscsi, ip, tcp, udp providers which probably would be useful for you.

Anyway, I was just pointing out that Solaris 11 might solve some or even all of your issues regarding HW support, OS support, long term vendor commitment (I don't agree with your comments in the other entry - if anything Oracle is investing more in Solaris than Sun ever did), etc.

Then regarding the cost, one other option might actually be buying Oracle x86 servers (x5-2?) - perhaps you could get a nice educational discount and the cost would be similar to your current servers, and OS support is 8% of the server cost which for lower spec configurations might be just couple or few hundreds dollars per year.

By cks at 2015-01-03 19:53:57:

I strongly disagree with you about DTrace and source code. All of our uses of DTrace to solve serious problems have relied on fbt providers, and the last time I looked at Oracle's Solaris 11 documentation there were still not enough non-fbt tracepoints for things like ZFS and NFS (where we need not just ways to trace what was happening but also ways to recover information like what ZFS pool and filesystem was involved in NFS operations).

(In addition we have other needs for the source code, such as decoding ZFS pool state.)

My earlier entry also omitted the risk of hardware support in Solaris 11 itself. Since I don't expect Oracle to fix anything that doesn't work for us, using Solaris 11 without fully guaranteed supported hardware is risky. As far as hardware, well, our experiences with Oracle's hardware pricing after they took over Sun have not caused me to expect them to be cheap, especially for a fully supported configuration (using, for example, Oracle branded 10G cards).

Is there something I don't understand here? Look at http://wiki.openindiana.org/oi/Ethernet+Networking since OmniOS is based on illumos I think there is indeed some 10G support.

You will see two fully supported 10G entries for oi_157a (+)

  • Broadcom NetXtreme II 10GbE controllers
  • Intel X520, Intel X540, Intel 82598, 82599 series (10Gb)

And you will also find one entry for hipster

  • Intel 82597EX 10GbE Ethernet Controller

Question - do you have a 10G switch or are you trying to directly tie a RJ-45 on a host 10G port to another RJ-45 on a second host 10G port ?

In the past I have seen 1G NIC negotiations fail "host to host" with no switch and default down to 100M, yet I got a full 1G when there was a switch between the two hosts.

If you were doing "host to host" A) please tell me what happens when you put a 10G switch in the middle. B) also tell me what happens when you try OmniOS "bloody".

These articles seems to show 10G work on some hardware and OS combos including OmniOS (last link)

Also if you continue to have issues, try IRC #omnios-discuss on http://webchat.freenode.net/ (I even asked the question for you on IRC - which hopefully will get answered)

http://lists.omniti.com/pipermail/omnios-discuss/2014-November/003614.html

By cks at 2015-01-05 14:59:25:

Jon Strabala: see my writeup of our Intel 10G-T problems, and an earlier story of a weird 10G-T port problem with these machines and OmniOS, and also my email to illumos-developer and subsequent messages.

We are using 10G-T switches, not direct host to host connections.

Our problems are likely chipset specific and thus probably specific to using 10G-T instead of 10G with SFP+ (partly since plenty of people seem to be using OmniOS happily with the latter setup). Intel has multiple 10G chipsets and chipset variations and the ixgbe driver has different code paths for them. Note that it is completely clear that the current ixgbe driver has a hard coded 10-20 msec lock hold (and busy-wait) that fires once a second on all X540-AT2 ports with carrier; see the mailing list thread for details. OmniOS Bloody will not improve this; the ixgbe driver has no relevant updates (and I think no updates at all from our version).

(And to answer the obvious question, switching to SFP+ for 10G is not an option for us. It was too expensive initially and would be worse now, since we've bought 10G-T switches.)

Written on 22 December 2014.
« Why Go's big virtual size for 64-bit programs makes sense
The security effects of tearing down my GRE tunnel on IPSec failure »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Dec 22 23:44:57 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.