Hardware and (Linux) driver quality can be invisible to non-specialists

July 30, 2021

We have been happy with Intel network hardware for a long time, especially under Linux. We've had a long and happy relationship with various generations of Intel 1G networking chipsets and cards, and then a more recent good history with Intel 10G-T chipsets, in our current generation of fileservers and some other machines (usually via add-on cards). Intel 10G-T cards have also been mostly problem free on various versions of OpenBSD (we had a weird issue or two with older OpenBSD versions).

Today I learned that Intel's high-speed networking hardware and possibly their Linux driver software is not all that well regarded by people with direct deep knowledge of it, courtesy of this tweet and its accompanying HN comment, which was in turn sparked by a blog post on hunting a serious Intel i40e driver bug. This wasn't a total surprise to me, because I think I've heard some prior grumbling to this effect, but it's certainly broadly surprising. But, not withstanding our experiences, I don't have any reason to doubt it.

The truth is that hardware and driver quality is hard for most people to observe outside of situations where things utterly fail. If Intel's 10G chipsets or drivers were utter unreliable trash, we and other people would have noticed long ago and they hopefully wouldn't be integrated on motherboards by various reasonably well regarded companies and so on. But this doesn't mean that the hardware and driver are good, just that they've been made to work in common situations. In other words, one reason that Intel's 10G-T chipsets have worked well for us is that we haven't been putting them under particular stress.

(We did test to see that Linux could manage more or less 10G line rate in artificial tests. But we didn't run these for hours or days, and in actual usage there are enough other overheads that I doubt we sustain close to 10G rates for more than a few seconds at a time.)

Does our ignorance matter? Well, maybe. Our Intel 10G-T environment works for us today without visible problems, but that doesn't mean it will work tomorrow or that there aren't 'invisible' problems we aren't noticing (we've had those before). Unfortunately becoming well educated about this sort of thing is a lot of work in the aggregate; there's a lot of drivers and hardware we'd have to dig into, and it's not clear how we could do it.

(The people who have educated opinions are those who do things like write drivers or put the hardware under significant stress in high powered environments.)

Written on 30 July 2021.
« XHTML pages cause problems for some Firefox addons
Learning that you can use unions in C for grouping things into namespaces »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jul 30 23:58:20 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.