Getting maximum 10G Ethernet bandwidth still seems tricky

September 14, 2024

For reasons outside the scope of this entry, I've recently been trying to see how FreeBSD performs on 10G Ethernet when acting as a router or a bridge (both with and without PF turned on). This pretty much requires at least two more 10G test machines, so that the FreeBSD server can be put between them. When I set up these test machines, I didn't think much about them so I just grabbed two old servers that were handy (well, reasonably handy), stuck a 10G card into each, and set them up. Then I actually started testing their network performance.

I'm used to 1G Ethernet, where long ago it became trivial to achieve full wire bandwidth, even bidirectional full bandwidth (with test programs; there are many things that can cause real programs to not get this). 10G Ethernet does not seem to be like this today; the best I could do was get close to around 950 MBytes a second in one direction (which is not 10G's top speed). With the right circumstances, bidirectional traffic could total to just over 1 GByte a second, which is of course nothing like what we'd like to see.

(This isn't a new problem with 10G Ethernet, but I was hoping this had been solved in the past decade or so.)

There's a lot of things that could be contributing to this, like the speed of the CPU (and perhaps RAM), the specific 10G hardware I was using (including if it lacked performance increasing features that more expensive hardware would have had), and Linux kernel or driver issues (although this was Ubuntu 24.04, so I would hope that they were sorted out). I'm especially wondering about CPU limitations, because the kernel's CPU usage did seem to be quite high during my tests and, as mentioned, they're old servers with old CPUs (different old CPUs, even, one of which seemed to perform a bit better than the other).

(For the curious, one was a Celeron G530 in a Dell R210 II and the other a Pentium G6950 in a Dell R310, both of which date from before 2016 and are something like four generations back from our latest servers (we've moved on slightly since 2022).)

Mostly this is something I'm going to have to remember about 10G Ethernet in the future. If I'm doing anything involving testing its performance, I'll want to use relatively modern test machines, possibly several of them to create aggregate traffic, and then I'll want to start out by measuring the raw performance those machines can give me under the best circumstances. Someday perhaps 10G Ethernet will be like 1G Ethernet for this, but that's clearly not the case today (in our environment).


Comments on this page:

By Aram Akhavan at 2024-09-15 00:33:31:

There are definitely some "gotchas", but I'm a little surprised you weren't able to easily hit the theoretical maximum throughput. I recently did the same experiment but with slightly more modern hardware (i5-8500 in a Lenovo ThinkCentre Tiny). With both older and newer 10Gb SFP+ NICs, I hit the maximum speed just by running iperf on each of two machines running Debian live. I did build iperf3 from source since the Debian package was old, and I had to use multiple streams (-P) or the single thread would saturate. But that was it. With one command on each machine I got full speed (lower, as expected, with the default 1500 MTU; then higher with jumbo frames). It's definitely possible to be CPU-limited, but it's very obvious in top/htop when that's the case.

For maximum throughput unfettered by Linux kernel syscall overhead you have to use io_uring.

Written on 14 September 2024.
« Threads, asynchronous IO, and cancellation
Why we're interested in FreeBSD lately (and how it relates to OpenBSD here) »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sat Sep 14 22:51:27 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.