The 10G Ethernet performance problem on Linux
It's clear that 10G Ethernet on Linux is not yet in the state that 1G Ethernet is, where you can simply assume that you'll get wire speed unless your hardware is terrible (but sometimes your hardware is terrible, or at least not great). Instead you need to tune things for best performance, and do so beyond the basics of MTU 9000 and large application buffers. There are even any number of resources on the web to tell you things about this; for example, I've recently been reading this one [PDF].
(There's also this one from 2008 [PDF slides] that I've seen referred to in a number of places.)
The problem here is simple: that paper is from 2009. Things have changed since 2009; in fact, I've seen things change between kernel 3.11.6 and kernel 3.12 (and they changed significantly between Ubuntu 12.04's 3.2.0 kernel and 3.11.6). Much of the other 10G tuning advice on the web I've found is like this, either clearly old or undated but probably old. Since they're old, some but not all of their performance tuning advice is likely out of date and either not necessary, not applicable any more, or actively counterproductive. Given the changes I've seen just between 3.11.6 and 3.12, this is probably going to continue to be the case for a while more; even carefully researched tuning advice written today may not apply in a year.
(At least not to current kernels. If you research tuning advice for, say, a RHEL/CentOS 6 kernel it's likely to stay useful for years because RHEL kernels don't change much.)
This is the 10G Ethernet performance problem on Linux as I see it. Today and for the likely future, getting good performance out of 10G Ethernet on Linux is going to take you real work. It's not enough to read some resources and follow their advice because parts of the advice may be out of date; you're going to have to experiment, ideally under real life scenarios not just artificial bandwidth or latency tests.
(Artificial tests can at best verify that under ideal circumstances you can hit wire bandwidth or wire latency. But the tuning you need for them may be different than the tuning you need for your live production load.)