Wandering Thoughts archives

2021-05-09

DKMS built one of my kernel modules for the wrong kernel

I recently tweeted:

Today I discovered that DKMS has spent some time silently (re)building one of my modules for the wrong kernel because DKMS. Naturally it didn't work. Since it was my sensor monitoring, I didn't notice for a while.

There is a complicated story here. This happened on my office workstation, which needs a very out of tree version of the it87 module in order to read the motherboard sensors. Because of ongoing problems in the 5.11 kernel series with my Radeon RX 550 card, I've been repeatedly upgrading my kernel to the latest Fedora 5.11.x, finding out that the kernel is no good, and falling back to the last-good kernel, which is Fedora's 5.10.23.

Recently (while in the default state of running 5.10.23), I noticed that I didn't have my usual motherboard sensor readings. Examining kernel messages for it87 problems, I immediately found the smoking gun:

[Fri Apr 23 15:17:51 2021] it87: version magic '5.11.15-200.fc33.x86_64 SMP mod_unload ' should be '5.10.23-200.fc33.x86_64 SMP mod_unload '

At that point I had 5.11.15 installed, making it the highest-version kernel, but I was running 5.10.23. This should be a supported system configuration but apparently DKMS somehow installed the 5.11.15 version of it87 (built when I installed that version) into 5.10.23's module area as well as 5.11.15's. So I told DKMS to remove the module and rebuild it. Surprise:

[Sat May  8 17:43:45 2021] it87: version magic '5.11.15-200.fc33.x86_64 SMP mod_unload ' should be '5.10.23-200.fc33.x86_64 SMP mod_unload '

Since 'dkms status' showed that the it87 module was removed before I had DKMS rebuilt it, DKMS knew what the correct kernel version was (because it installed the new it87 module there), but it rebuilt it for the wrong kernel version. What I wound up doing was removing the 5.11.15 RPMs entirely. This finally made DKMS build the module right. Possibly I could also have made DKMS work right by explicitly specifying the kernel version in 'dkms build' and 'dkms install'.

In the future I'm probably going to explicitly specify the kernel version for every DKMS build and install, even if it's the current running kernel. I'm also going to have to check that DKMS installed modules are for the right kernel, especially if I'm building them in some unusual situation. And obviously I now have a mental note to check that all my sensors still work after every reboot.

Somewhat to my surprise, DKMS is actively maintained in the DKMS git repository. But it is still a 3,935 line Bash script (which is up slightly from 2016). It's really a marvel that it works as well as it does, but on the other hand it's somewhat terrifying that so many Linux systems depend on it working reliably.

(One of the fun things about using DKMS on Fedora is that it rebuilds the initramfs for every installed kernel every time you install a DKMS module, regardless of which kernel you're installing the module into and whether or not the module would be included in the initramfs. This takes a substantial amount of time and there's no way to turn it off.)

Update: This turns out to be a significant Fedora issue instead of a DKMS bug, although DKMS could do more to defend against it (since DKMS knows a particular feature can't work on Fedora).

linux/DKMSBuiltForWrongKernel written at 23:17:35; Add Comment

Storing ZFS send streams is not a good backup method

One of the eternally popular ideas for people using ZFS is doing backups by using 'zfs send' and storing the resulting send streams. Although appealing, this idea is a mistake, because ZFS send streams do not have the properties you want for a backup format.

A good backup format is designed for availability. No matter what happens, it should let you extract as much from it as possible, from both full backups and incremental backups. If your backup stream is damaged, you should still be able to find and restore as much as possible, both before and after the damage. If a full backup is missing or destroyed, you should still be able to recover something from whatever incrementals you have. This requires incremental backups to have more information in them than they specifically need, but that's a tradeoff you make for availability.

A better backup format should also be convenient to operate, and one big aspect of this is selective restores. A lot of the time you don't need to restore absolutely everything, you just want to get back one file or some files that you need because they got removed, damaged, or whatever. If you have to a complete restore (both full and incremental) in order get back a single file, you don't have a convenient backup format. Other nice things are, for example, being able to readily get an index of what is captured in any particular backup stream (full or incremental).

Incremental ZFS send streams do not have any of these properties and full ZFS send streams only have a few of them. Neither full nor incremental streams have any resilience against damage to the stream; a stream is either entirely intact or it's useless. Neither has selective restores or readily available indexes. Incremental streams are completely useless without everything they're based on. All of these issues will sooner or later cause you pain if you use ZFS streams as a backup format.

ZFS send streams are great at what they're for, which is replicating ZFS filesystems from one ZFS pool to another in an environment where you can immediately deal with any problems that come up (whether by retrying the send of a corrupted stream, changing what it's based on, or whatever you need to do). The further you pull 'zfs send' away from this happy path, the more problems you're going to have.

(The design decisions of ZFS send streams make a great deal of sense for this purpose. As a replication format they're designed to be easy to generate, easy to receive, and compact, especially for incremental send streams. They have no internal redundancy or recovery from corruption because the best recovery is 'resend the stream to get a completely good one'.)

(This comes up on the ZFS on Linux mailing list periodically and I write replies (eg, also), so it's time to write this down in an entry.)

solaris/ZFSSendNotABackup written at 00:01:12; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.