Modern languages and bad packaging outcomes at scale

January 31, 2025

Recently I read Steinar H. Gunderson's Migrating away from bcachefs (via), where one of the mentioned issues was a strong disagreement between the author of bcachefs and the Debian Linux distribution about how to package and distribute some Rust-based tools that are necessary to work with bcachefs. In the technology circles that I follow, there's a certain amount of disdain for the Debian approach, so today I want to write up how I see the general problem from a system administrator's point of view.

(Saying that Debian shouldn't package the bcachefs tools if they can't follow the wishes of upstream is equivalent to saying that Debian shouldn't support bcachefs. Among other things, this isn't viable for something that's intended to be a serious mainstream Linux filesystem.)

If you're serious about building software under controlled circumstances (and Linux distributions certainly are, as are an increasing number of organizations in general), you want the software build to be both isolated and repeatable. You want to be able to recreate the same software (ideally exactly binary identical, a 'reproducible build') on a machine that's completely disconnected from the Internet and the outside world, and if you build the software again later you want to get the same result. This means that build process can't download things from the Internet, and if you run it three months from now you should get the same result even if things out there on the Internet have changed (such as third party dependencies releasing updated versions).

Unfortunately a lot of the standard build tooling for modern languages is not built to do this. Instead it's optimized for building software on Internet connected machines where you want the latest patchlevel or even entire minor version of your third party dependencies, whatever that happens to be today. You can sometimes lock down specific versions of all third party dependencies, but this isn't necessarily the default and so programs may not be set up this way from the start; you have to patch it in as part of your build customizations.

(Some languages are less optimistic about updating dependencies, but developers tend not to like that. For example, Go is controversial for its approach of 'minimum version selection' instead of 'maximum version selection'.)

The minimum thing that any serious packaging environment needs to do is contain all of the dependencies for any top level artifact, and to force the build process to use these (and only these), without reaching out to the Internet to fetch other things (well, you're going to block all external access from the build environment). How you do this depends on the build system, but it's usually possible; in Go you might 'vendor' all dependencies to give yourself a self-contained source tree artifact. This artifact never changes the dependency versions used in a build even if they change upstream because you've frozen them as part of the artifact creation process.

(Even if you're not a distribution but an organization building your own software using third-party dependencies, you do very much want to capture local copies of them. Upstream things go away or get damaged every so often, and it can be rather bad to not be able to build a new release of some important internal tool because an upstream decided to retire to goat farming rather than deal with the EU CRA. For that matter, you might want to have local copies of important but uncommon third party open source tools you use, assuming you can reasonably rebuild them.)

If you're doing this on a small scale for individual programs you care a lot about, you can stop there. If you're doing this on an distribution's scale you have an additional decision to make: do you allow each top level thing to have its own version of dependencies, or do you try to freeze a common version? If you allow each top level thing to have its own version, you get two problems. First, you're using up more disk space for at least your source artifacts. Second and worse, now you're on the hook for maintaining, checking, and patching multiple versions of a given dependency if it turns out to have a security issue (or a serious bug).

Suppose that you have program A using version 1.2.3 of a dependency, program B using 1.2.7, the current version is 1.2.12, and the upstream releases 1.2.13 to fix a security issue. You may have to investigate both 1.2.3 and 1.2.7 to see if they have the bug and then either patch both with backported fixes or force both program A and program B to be built with 1.2.13, even if the version of these programs that you're using weren't tested and validated with this version (and people routinely break things in patchlevel releases).

If you have a lot of such programs it's certainly tempting to put your foot down and say 'every program that uses dependency X will be set to use a single version of it so we only have to worry about that version'. Even if you don't start out this way you may wind up with it after a few security releases from the dependency and the packagers of programs A and B deciding that they will just force the use of 1.2.13 (or 1.2.15 or whatever) so that they can skip the repeated checking and backporting (especially if both programs are packaged by the same person, who has only so much time to deal with all of this). If you do this inside an organization, probably no one in the outside world knows. If you do this as a distribution, people yell at you.

(Within an organization you may also have more flexibility to update program A and program B themselves to versions that might officially support version 1.2.15 of that dependency, even if the program version updates are a little risky and change some behavior. In a distribution that advertises stability and has no way of contacting people using it to warn them or coordinate changes, things aren't so flexible.)

Written on 31 January 2025.
« The tradeoffs of having an internal unauthenticated SMTP server
An alarmingly bad official Ubuntu 24.04 bpftrace binary package »

Page tools: View Source.
Search:
Login: Password:

Last modified: Fri Jan 31 22:30:31 2025
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.