2024-08-09
The Broadcom 'bnxt' Ethernet driver and RDMA (in Ubuntu 24.04)
We have a number of Supermicro machines with dual 10G-T Broadcom based networking; specifically what they have is the 'BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller'. Under Ubuntu 22.04, everything is fine with these cards (or at least seems to be in non-production use), using the normal bnxt_en kernel driver module. Unfortunately this is not our experience in Ubuntu 24.04.
In Ubuntu 24.04, these machines also load an additional Broadcom bnxt driver, bnxt_re, which is the 'Broadcom NetXtreme-C/E RoCE' driver. RoCE is short for RDMA over Converged Ethernet, and to confuse you, this driver is found in the 'Infiniband' area of the Linux kernel drivers tree. Unfortunately, on our hardware the 24.04 bnxt_re doesn't work (or maybe the hardware doesn't work and bnxt_re is failing to detect that, although with 'RDMA' in the name of the hardware one sort of suspects it's supposed to work). The driver stalls during boot and spits out kernel messages like:
bnxt_en 0000:ab:00.0: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xf]=0x3 waited (102721 > 100000) msec active 1 bnxt_en 0000:ab:00.0 bnxt_re0: Failed to modify HW QP infiniband bnxt_re0: Couldn't change QP1 state to INIT: -110 infiniband bnxt_re0: Couldn't start port bnxt_en 0000:ab:00.0 bnxt_re0: Failed to destroy HW QP [... more fun ensues ...]
This causes systemd-udev-settle.service to fail:
udevadm[1212]: Timed out for waiting the udev queue being empty. systemd[1]: systemd-udev-settle.service: Main process exited, code=exited, status=1/FAILURE
This then causes Ubuntu 24.04's ZFS services to fail to completely start, which is a bad thing on hardware that we want to use for our ZFS fileservers.
We aren't the only people with this problem, so I was able to find various threads on the Internet, for example. These gave me the solution, which is to blacklist the bnxt_re kernel module, but at the time left me with the mystery of how and why the bnxt_re module was even being loaded in the first place.
The answer is that bnxt_re is being loaded through the second sort of kernel driver module loading. It is an 'auxiliary' module for handling RDMA on top of the normal bnxt_en network driver, and the bnxt_en module basically asks for it to be loaded (which also suggests that at least the module thinks the hardware should be able to do RDMA properly). More specifically, bnxt_en basically asks for bnxt_en.rdma to be loaded, and that that is an alias for bnxt_re. Fortunately you don't have to know all of this in order to block bnxt_re from loading.
We don't have any 22.04 installs on this specific hardware any more, so I can't be completely sure what happened under 22.04, but it appears that 22.04 didn't load the bnxt_re module on these servers. Running 'modinfo' on the 22.04 module shows that it doesn't have the bnxt_en.rdma module alias it does in 24.04, so maybe you had to manually load it if your hardware had RDMA and you wanted to use it.
(Looking at kernel source history, it appears that bnxt_re support for using this 'auxiliary driver interface' only appeared in kernel 6.3, which is much too late for Ubuntu 22.04's normal server kernel, which is based on 5.15.0.)
One of my lessons learned from this is that in today's Linux kernel environment, drivers may enable additional functionality that you neither asked for or wanted, just because it's there. We don't use RDMA and never asked for anything related to RoCE, but because the hardware is (theoretically) capable of it, we got it anyway.