2010-09-15
Why I like modules to be in the Python standard library
Even when there may be perfectly good third party modules for something, I really want there to be a module for it in Python's standard library. Part of the reason is obviously how I find third party modules to be awkward, but another part of it is what I call the selection problem.
The selection problem is the problem of picking a third party module (sometimes even finding it, although pypi helps with that) and figuring out if it's any good. Simply figuring out the quality of a module is a bunch of work, and the amount of work multiplies drastically if there's several third party modules that all do what I want. Often, the only way I can really tell if a module is going to work well is to actually try using it. Generally this has to be in a real program (I find toy examples both frustrating to write and uninformative), which means that if I have picked poorly I may have wasted a bunch of time and effort. Even if I can rule out a module relatively early, I had to spend the time to read documentation or skim code or the like, and that time's all wasted.
(And frankly it's frustrating to run into near misses, modules that almost do what I need and almost work. Faced with this, it often at least feels easier to write something from scratch myself if what I want isn't too big.)
When a module has made it into the standard library, I don't have to go through all of this; I can just use the module, secure in the confidence that this is a good implementation of whatever it is that I want to do. Someone else has already gone through all of this quality assurance work, and if there were multiple implementations the Python people have probably either picked the best one or at least determined that they are more or less equivalent and so I am not missing anything very important by not looking at the other options.
(Yes, sometimes this confidence is misplaced. But generally it's at least close.)
Update: see also WhyInStandardLibraryII for additional comments on the time drain of the selection problem.
An overview of the Debian and RPM source package formats
This is a brief and jaundiced overview of the format of Debian and RPM
source packages, what the Debian and RPM package systems theoretically
use to generate the compiled binary packages that people actually
install. As usual, this applies to all distributions that use the Debian
.deb
package format or the Red Hat .rpm
package format, although
specific details vary. Also, I'm going to simplify to the common case.
A source RPM contains a specfile, a source tarball, and some number of patches. The specfile describes the package, names the source tarball and the patches, and contains a script that configures and compiles the binaries (I simplify). It can also contain scripts that will be run when the binary package is installed, removed, upgraded, or a number of other events. Specfiles support a complicated system of text macros, macro substitution, conditional 'execution' of portions of the specfile (which may wind up omitting or including some patches), and even more peculiar things; these are used to automate a lot of standard parts of the package build process, such as configuring a program that uses standard GNU autoconf.
There is no fixed layout of where all of these pieces go when a source RPM is unpacked and built; it depends on your local configuration, although some arrangements are more sensible than others.
(Note that those RPM settings have probably gotten slightly broken since 2006, since they seem to now be doing slightly odd things for me. RPM macros have a lot of magic in them.)
A Debian source package contains a description file, a source
tarball, and a patch. After unpacking the source tarball and
applying the patch, there must be a top level subdirectory called
debian
. Files in this subdirectory are used to control the rest of
the build and packaging process; although a number are required, the
most important one is debian/rules
, which is the Makefile used to
build the package.
(Note that this subdirectory can contain lots of things besides the
Debian package building control files. For instance, if the Debian
package wants to run scripts when it's installed, removed, or so on, it
will usually store the scripts in debian/
.)
Much like RPM specfiles and their macros, Debian rules files support
a complicated system of helper programs to do most of the actual
work. A typical Debian rules
file cannot be fully understood without
knowing what these programs do (some of this can be deduced from
their names). Debian being Debian, I believe that there are several
generations and versions of these helper programs (and no doubt epic
flamewars have been fought over which ones to use when).
(Debian helper programs are better documented than RPM macros, for various reasons. Or at least more conveniently documented, since they have manpages.)
A Debian rules
file may or may not further patch the source in the
process of building it. One style of Debian package rolls both making
any necessary modifications to the package source code and creating
the contents of the debian
directory into the initial patch; another
uses the initial patch only to create the debian
directory and then,
RPM-like, applies a series of source patches from the debian
directory
during the build process. Determining which approach any particular
Debian package uses may require close attention to the rules
file,
although if there is a debian/patches
directory the odds are good that
this source package uses some version of RPM-like two stage patching.
(In the Debian way, there appear to be at least three different systems for doing such patching, each somewhat different.)