Backporting changes is clearly hard, which is a good reason to avoid it

January 16, 2023

Recently, the Linux 6.0 kernel series introduced a significant bug in 6.0.16. The bug was introduced when a later kernel change was backported to 6.0.16 with an accidental omission (cf). There are a number of things you can draw from this issue, but the big thing I take away from it is that backporting changes is hard. The corollary of this is that the more changes you ask people to backport (and to more targets), the more likely you are to wind up with bugs, simply through the law of large numbers. The corollary to the corollary is that if you want to keep bugs down, you want to limit the amount of backporting you do or ask for.

(The further corollary is that the more old versions you ask people to support (and the more support you want for those old versions), the more backports you're asking them to do and the more bugs you're going to get.)

I can come up with various theories why backporting changes is harder than making changes in the first place. For example, when you backport a change you generally need to understand the context of more code; in addition to understanding the current code before the change and the change itself, now you need to understand the old code that you're backporting to. Current tools may not make it particularly easy to verify that you've gotten all of the parts of a change and have not, as seems to have happened here, dropped a line. And if there's been any significant code reorganization, you may be rewriting the change from scratch instead of porting it, working from the intention of the change (if you fully understand it).

(Here, there is still an net_csk_get_port() function in 6.0.16 but it doesn't quite look like the version the change was made to so the textual patch doesn't apply. See the code in 6.0.19 and compare it to the diff in the 6.1 patch or the original mainstream commit.)

Some people will say that backports should be done with more care, or that there should be more tests, or some other magical fix. But the practical reality is that they won't be. What we see today is what we're going to continue getting in the future, and that's some amount of errors in backported changes, with the absolute number of errors rising as the number of changes rises. We can't wish this away with theoretical process improvements or by telling people to try harder.

(I don't know if there are more errors in backported changes than there are in changes in general. But generally speaking the changes that are being backported are supposed to be the ones that don't introduce errors, so we're theoretically starting from a baseline of 'no errors before we mangle something in the backport'.)

PS: While I don't particularly like its practical effects, this may make me a bit more sympathetic toward OpenBSD's support policy. OpenBSD has certainly set things up so they make minimal changes to old versions and thus have minimal need to backport changes.


Comments on this page:

By Miksa at 2023-01-17 04:08:03:

I often think about all the supported kernel versions and the duplicated effort that they waste. Upstream supports bunch of versions, looks like 10 at the moment. And then the distributions support several too, all different.

At work we administer Red Hat, Ubuntu and couple SUSE servers. Quick scan shows 9 different kernel versions, and only Ubuntu 22.04's 5.15 matches with Kernel.org. But even then, Ubuntu's 5.15.0-56 isn't really the same as 5.15.88.

I wish distributions would just choose a Kernel.org longterm kernel that happens to match their release.

They use these supposedly-fancy tools, such as git, and it gets them absolutely nothing for this, it seems. It's clear to me a proper language with proper tools wouldn't have this issue, but the C language is improper and C language programmers are allergic to proper tools.

Personally, I prefer to avoid changing old programs by getting them right the first time, but understand this is an unpopular position.

For example, when you backport a change you generally need to understand the context of more code; in addition to understanding the current code before the change and the change itself, now you need to understand the old code that you're backporting to.

MINIX immediately comes to mind. Even using the C language, it's possible to organize a program into many small parts and avoid these issues. The Linux kernel is a monolithic program in perhaps the worst language seriously used for such programs.

What we see today is what we're going to continue getting in the future, and that's some amount of errors in backported changes, with the absolute number of errors rising as the number of changes rises. We can't wish this away with theoretical process improvements or by telling people to try harder.

I happen to believe a computer is a machine to automate work, and correctly so; why shouldn't this task be automated?

Written on 16 January 2023.
« Some weird effects you can get from shared Let's Encrypt accounts
An aggressive, stealthy web spider operating from Microsoft IP space »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jan 16 22:47:53 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.