Our SunFire X2100 nVidia Ethernet experiences

February 6, 2009

I mentioned last entry that we had seen problems with the onboard nVidia Ethernet ports on our SunFire X2100-based Linux iSCSI backends. Here's the details.

The SunFire X2100s have nVidia motherboards and four onboard Ethernet ports, two nVidia based ones and two Broadcomms. In our configuration, one Broadcomm and one nVidia port are used for iSCSI networking, the other nVidia port is used for general system access, and the second Broadcomm port is used only by the integrated service processor. Only the ports used for iSCSI see any significant traffic volume.

What I ran into was that under heavy streaming iSCSI IO, in other words more or less continuous TCP at close to wire rates, the nVidia iSCSI port would start reporting:

kernel: eth2: too many iterations (6) in nv_nic_irq.

When this happened, network activity on that port either dropped significantly or stopped entirely, with bad overall effects on iSCSI data rates. The Broadcomm iSCSI port had no problems, despite seeing the same level of traffic.

My solution was to take a club to the situation by setting a module parameter to suppress the situation; in /etc/modprobe.conf I set:

options forcedeth max_interrupt_work=100

This seems to have made the problem go away; certainly we don't see either the kernel message or network slowdowns any more, including under sustained IO loads.

(Note that we are using the default forcedeth kernel driver, in specific whatever version is included in the kernel.org 2.6.25.3 kernel; it appears that this is version 0.61.)

Sidebar: some references

I haven't found anything that really explains what's going on, assuming that there's even a common cause across all of the cases. Given that this is various versions of potentially buggy hardware combined with a reverse engineered driver (because nVidia has been less than helpful), there are a lot of potential problems and causes.

Written on 06 February 2009.
« An alarming ZFS status message and what is usually going on with it
An illustration of one reason that documentation is hard »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Feb 6 00:20:08 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.