Wandering Thoughts archives

2021-05-19

Fedora has significantly fumbled DKMS handling for Linux kernel modules

Recently I blamed DKMS for building one of my kernel modules for the wrong kernel. It turns out that this is not quite what happened, and in fact what happened is Fedora's fault. I found this out when I did yet another unsuccessful kernel upgrade attempt on my work machine, had the module problem happen again, and paid close attention to what was actually in my module directory (partly because this time around I was sure I'd checked for the problem before I'd rebooted).

As part of their customized kmod package, Fedora ships a program, /usr/sbin/weak-modules, which apparently comes from Red Hat Enterprise and which exists to, well, let me quote the comment at the start of the shell script:

weak-modules - determine which modules are kABI compatible with installed kernels and set up the symlinks in /lib/*/weak-updates.

This is the best documentation we have for the program because Fedora has not provided a manual page. Well, actually the current DKMS manual page says something about this, and what it says is so incredible that I'm going to quote it almost in full, with emphasis mine. This is from the section where it talks about various parameters you can set in a module's dkms.conf:

The NO_WEAK_MODULES parameter prevents dkms from creating a symlink into the weak-updates directory, which is the default on Red Hat derivatives. The weak modules facility was designed to eliminate the need to rebuild kernel modules when kernel upgrades occur and relies on the symbols within the kABI.

Fedora does not guaranteed a stable kABI so it should be disabled in the specific module override by setting it to "yes". [...]

Let me translate that for you: Fedora requires all DKMS modules to disable this feature that they themselves have added.

What happens if you use a DKMS-built module that doesn't do this is that Fedora's weak-modules script may well decide that the version of the module DKMS just built fresh for your newly-installed kernel is compatible with every previous kernel you have. It will then install symlinks from /lib/modules/<old-kernel>/weak-updates/<module>.ko to the very latest module you built, and then the Fedora version of depmod, modprobe, and so on will prefer those over the properly built copy in /lib/modules/<old-kernel>/extras/<module>.ko, and you will be unhappy because your module is not working.

This failure is a fundamental decision Fedora has made with its kernel builds. Kernel modules have a piece of information called 'vermagic' (you can see it with modinfo), which is the kernel version and broad environment the kernel module was built for. Fedora will not load a kernel module with a mismatched vermagic, which means that no kernel module can ever be (kABI) compatible across kernel versions.

Fedora shipping the weak-modules script is a sham. It cannot ever work as designed and intended, the weak-modules kABI check is broken on Fedora, and the only net result is that all DKMS modules on Fedora must add a special Fedora-only parameter in order to work right. If they don't, things work (only) as long as you never revert back to a previous kernel.

I would file a bug report but I cannot imagine that Fedora would accept it. They already know that this feature doesn't work; it's right there in the DKMS manpage. It's there anyway because, well, I don't know. This is the most user-hostile decision I think I've ever seen Fedora make.

(You might wonder how ZFS on Linux isn't affected by this. The answer is that they specifically turn it off. The dkms.conf for my out of tree it87 module doesn't.)

linux/FedoraWeakUpdatesFailure written at 23:48:19; Add Comment

Speculating on why DKMS and other Linux things are large shell scripts

I've mentioned that one of the practical issues with DKMS is that it's a giant Bash script and that Bash isn't the right language for large programs. I'm sure that these issues aren't new to the authors of DKMS, so an obvious question to ask is why did they chose to use Bash for DKMS. In fact you could ask this about a surprising number of relatively substantial programs on a typical Linux system. Dracut is 2,000 lines of Bash, for example. The authors of these programs are neither stupid nor crazy. Although I don't know their reasons, I can speculate.

For programs like DKMS and Dracut we can rule out broad portability, since they're already for Linux only. Portability is a likely issue for some large scripts on Linux systems that aren't strictly speaking Linux-only, such as the 12,000 lines of libtool. However, even for Linux only programs, people may care about avoiding additional dependencies and requirements. According to RPM on my Fedora 33 desktop, Dracut and DKMS each theoretically require only on a relatively small collection of core Linux programs and packages.

But I find myself believing that there's another reason for projects like DKMS and Dracut to stick to Bash, no matter if it's an awkward fit, and that's the desire to remain what you could call apolitical in their choice of language, which is to say avoiding having people argue with them about it. If DKMS or Dracut was written in Perl, Python, Ruby, or maybe even C++, there probably would be any number of people who'd be vocally unhappy with the choice and with the authors for making it. Unfortunately, I think that there are very few language environments on Linux that everyone considers acceptable and part of what they feel is the base system. In some ways Linux is quite conservative.

(I may be overly influenced from having been reading the Linux kernel mailing list during the time when there was a proposal to write a better kernel configuration system in Python. Some people were pretty unhappy at the choice of Python, although that was not the only problem with the proposal.)

Bash is not a particularly good language for writing a large program with significant logic that's potentially an important part of maintaining a Linux system. But it's a choice everyone will accept. They may think you're crazy, but they're probably not going to try to keep your program out of their Linux distribution because of what it's written in.

(An advantage of Bash is that I don't think it's changed all that much over time, so you don't have to worry about what version a particular Linux distribution has the way you do have to for Python or some of the other options. In my opinion this is overshadowed by its drawbacks for large programs.)

PS: This isn't an issue that BSD Unixes have, because their model of development means that a BSD can declare something part of the base system if they want to. Linux isn't coherent like this, so people are forced to either stick to the practical union of what distributions accept or start an argument to enlarge that de facto core set. Trying to do this can be contentious, as seen in the recent case of Python cryptography, Rust, and Gentoo.

linux/WhyBashLargeScripts written at 00:05:02; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.