2012-08-19
Why I don't like the Debian source package format
I'm about to start modifying an Ubuntu package to build a modified version, so I've been reminded of all of the reasons that I don't really like the way Debian source packages work (Ubuntu uses the Debian package format). To explain why, I need to start with a brief description of how Debian source packages are put together.
Like all source package formats, Debian source packages need to contain
the four essential things,
but they do it in an odd way. Source packages generally come in two
different forms; let us call these the distribution form, which is
what you download, and the working form, which is what you expand the
distribution form into. The distribution form of a Debian source
package has three components: a .dsc
file of basic metadata, one or
more bundles (usually one as a compressed tar archive) of the upstream
source code, and a bundle of Debian modifications that are applied on
top of it. The working form of Debian source packages is a directory
tree of the source code, which is created by unpacking the upstream
source code and then applying the Debian modifications to it; after
this, the source tree will have a debian/
subdirectory with control
files that describe how to build and package the source.
The largest problem with this is that the working form of Debian source packages has the upstream source already modified. The upstream source is not patched as part of building the binary packages (well, mostly); the upstream source is patched the moment that you even look at the source package. A direct consequence of this is that the control files do not exist until after you have patched the upstream source, or at least they do not exist in any readily accessible format.
Now we get to the mess that is the bundle of Debian changes. Debian
source packages have two formats for this (or at least two dominant,
non-experimental formats); the changes can be either a single giant
patch file (which must include creating the debian/
subdirectory)
or a tarball with the Debian control files and a set of quilt
-based
patch files (cf). The
quilt-based format is sane, but it's the newer one. The giant patch file
format is the older one and still quite common, but there are two things
wrong with it.
The first thing wrong with it is that it's terrible as a format for modifications, especially for sysadmins who want to add a change to an existing Debian or Ubuntu package. Let me count the way:
- a single giant diff is hard to read, especially when it also includes
creating the control files in the
debian/
directory. - a single giant diff smashes logically separate changes together into
one big mess.
- it's impossible to separate out your own changes so that you can easily apply them on top of the next Debian or Ubuntu package update; you're better off keeping diffs of your changes outside of the source package system and then (re)applying them by hand.
The second thing wrong with it is that all of its problems spawned a
bunch of workarounds, which basically use quilt
or something like
it in the debian control files to apply changes as the software is
built. The large scale consequence of this is unpredictability. When
you get and set up a Debian source package as a non-specialist, the
resulting source tree may or may not reflect the Debian modifications
and there is no single procedure that you can follow to make changes to
it (especially maintainable changes).
(A meta-consequence of the introduction of a new and radically different format is that there are plenty of guides on 'how to do things with Debian source packages' out there on the web that hasn't been updated to discuss the new format. This is again a problem for non-specialists, who don't know enough to know that the guide is incomplete until something goes wrong. Note that I am barely more than a non-specialist (and most of that is due to writing this entry).)
A smaller irritation but something that is symptomatic of how Debian
does things is how you change the build version of generated packages. A
normal, sane packaging system would have an explicit field for this in
metadata somewhere. In the Debian package format, the build version
is derived by implication: each entry in the Debian changelog for the
package (found in the debian/
subdirectory) has a build version
attached to it, so the build system 'simply' finds the build version of
the top (most recent) changelog entry and uses that.
(On reflection, I've decided not to say anything about the
debian/rules
file. I'm a non-expert in working with Debian source
packages and so I suspect that everything about it is much clearer and
more obvious to people who are Debian package building experts. Every
source package system involves a certain amount of arcana and doing the
actual compiling and so on is often where it's concentrated.)