The problems (Open)ZFS can have on new Linux kernel versions

September 5, 2024

Every so often, someone out there is using a normal released version of OpenZFS on Linux (currently ZFS 2.2.6, which was just released) on a distribution that uses very new kernels (such as Fedora). They may then read that their version of ZFS (such as 2.2.5) doesn't list the latest kernel (such as 6.10) as a 'supported platform'. They may then wonder why this is so.

Part of the answer is that OpenZFS developers are cautious people who don't want to list new kernels as officially supported until people have carefully inspected and tested the situation. Even if everything looks good, it's possible that there is some subtle problem in the interface between (Open)ZFS and the new kernel version. But another part of the answer comes down to how the Linux kernel has no stable internal API, which is also part of how you can get subtle problems in new kernels.

The Linux kernel is constantly changing how things work internally. Functions appear or go away (or simply mutate); fields are added or removed from C structs, or sometimes change their meaning; function arguments change; how you're supposed to do things shifts. It's up to any out of tree code, such as OpenZFS, to keep up with these changes (and that's why you want kernel modules to be in the main Linux kernel if possible, because then other people do some of this work). So to merely compile on a new kernel version, OpenZFS may need to change its own code to match the kernel changes. Sometimes this will be simple, requiring almost no changes; other times it may lead to a bunch of modifications.

(Two examples are the master pull request for 6.10, which had only a few changes, and the larger master pull request for 6.11, which may not even be quite complete yet since 6.11 is not yet released.)

Having things compiling is merely the first step. The OpenZFS developers need to make sure that they're making the right changes, and also they generally want to try to see if things have changed in a way that doesn't break compiling code. To quote a message from Rob Norris on the ZFS on Linux mailing list:

"Support" here means that the people involved with the OpenZFS are reasonably certain that the traditional OpenZFS goals of stability, durability, etc will hold when used with that kernel version. That usually means the test suites have passed, there's no significant new issues reported, and at least three people have looked at the kernel changes, the matching OpenZFS changes, and thought very hard about it.

As a practical matter (as Rob Norris notes), this often means that development versions of OpenZFS will often build and work on new kernel versions well before they're officially supported. Speaking from personal experience, it's possible to be using kernel versions that are not yet 'supported' without noticing until you hit an RPM version dependency surprise.


Comments on this page:

By Etienne Dechamps at 2024-09-07 12:37:54:

For an example of how Linux changes can cause subtle bugs in ZFS if one is not careful, I am reminded of this bug from the early days of ZFS On Linux, where a faulty kernel API check in ZFS build scripts resulted in write flush operations being effectively disabled on some kernel versions. The end result would be potential data corruption on power loss. Hopefully this makes it a bit clearer why the ZoL maintainers tend to be paranoid about this stuff.

Written on 05 September 2024.
« Using rsync to create a limited ability to write remote files
Operating system threads are always going to be (more) expensive »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Thu Sep 5 23:00:12 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.