Hardware can be weird, Intel 10G-T X540-AT2 edition

August 8, 2014

Every so often I get a pointed reminder that hardware can be very weird. As I mentioned on Twitter today, we've been having one of those incidents recently. The story starts with the hardware for our new fileservers and iSCSI backends, which is built around SuperMicro X9SRH-7TF motherboards. These have an onboard Intel X540-AT2 chipset that provides two 10G-T ports. The SuperMicro motherboard and BIOS lights up these ports no later than when you power the machine on and leave it sitting in the BIOS, and maybe earlier (I haven't tested).

On some but not all of our motherboards, the first 10G-T port lights up (in the BIOS) at 1G instead of 10G. When we first saw this on a board we thought we had a failed board and RMA'd it; the replacement board behaved the same way but when we booted an OS (I believe a Linux) the port came up at 10G and we assumed that all was well. Then we noticed that some but not all of our newly installed OmniOS fileservers had their first port (still) coming up at 1G. At first we thought we had cable issues, but the cables were good.

In the process of testing the situation out, we rebooted one OmniOS fileserver off a CentOS 7 live cd to see if Linux could somehow get 10G out of the hardware. Somewhat to my surprise it could (and a real full 10G at that). More surprising, the port stayed at 10G when we rebooted into OmniOS. It stayed at 10G in OmniOS over a power cycle and it even stayed at 10G after a full power off where we cut power to the entire case for several minutes. Further testing showed that it was sufficient merely to boot the CentOS 7 live cd on an affected server without ever configuring the interface (although it's possible that the live cd configures the interface up to try DHCP and then brings it down again).

There's a lot of weirdness here. It'd be one thing for the Linux driver to bring up 10G where the OmniOS one didn't; then it could be that the Linux driver was more comprehensive about setting up the chipset properly. For it to be so firmly persistent is another thing, though; it suggests that Linux is reprogramming something that stays programmed in nonvolatile storage. And then there's the matter of this happening only on some motherboards and only to one port out of two that are driven by the same chipset.

Ultimately, who knows. We're happy because we apparently have a full solution to the problem, one we've actually carried out on all of the machines now because we needed to get them into production.

(As far as we can easily tell, all of the motherboards and the motherboard BIOSes are the same. We haven't opened up the cases to check the screen printing for changes and aren't going to; these machines are already installed and in production.)


Comments on this page:

By Pete at 2014-08-08 11:36:38:

Reminds me how I had to buy a special Ethernet card from a short list supported by VMware.

By z.t. at 2014-11-19 03:32:01:

The linux driver can update non-volatile memory, though I don't know what this feature is used for: http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c#L809

By Supermathie at 2014-12-11 17:24:04:

This kind of smells like the first port (the one lighting up at 1Gb) might be initialized by the BMC at 1Gb for shared IPMI/OS access.

Unless I misunderstand and it's not the first "onboard" port.

By cks at 2014-12-11 18:47:56:

I don't think it's a shared port issue as these motherboards have a separate dedicated 1G IPMI port (that's not even visible to the OS, thankfully). I haven't seen any signs that they can ever use a 10G port for the IPMI (although maybe I've missed them).

By Pete at 2015-04-23 21:48:21:

Im running the exact same motherboard and have the same issue on ESXI 6

I have dual embedded X540-AT2 on my Supermicro X9SRH-7TF. Im running Driver version 3.21.16iov but my firmware shows: 0×80000260 using command esxcli network nic get -n vmnic0….which is quite odd.

I also cannot find updated firmware for these nics anywhere on Supermicro's lovely FTP Site ftp://ftp.supermicro.com/. Any idea how I can identify the firmware and/or find an update based on this hex value or find a firmware update?

Note: I am running ESXi 6.

By cks at 2015-04-24 13:03:33:

I'm afraid I don't know how to find out the firmware version, in ESXi or anything else. We've never looked into this or into firmware updates.

Written on 08 August 2014.
« A peculiarity: I'm almost never logged in to websites
Intel has screwed up their DC S3500 SSDs »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Aug 8 00:52:22 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.