2009-11-27
Modern version control systems change your directory layouts
We're in the process of a slow conversion from maintaining files with RCS to mostly keeping them in Mercurial repositories, which has wound up making me think about how we want to structure them and which of our existing directory areas are easy to convert. As a result of this, I have an observation: modern whole-directory VCSes change how sysadmins want to lay out files and directories.
In the old days of single-file version control, it made perfect sense to group files together in directories by function or system, regardless of how they were generated or updated. For example, you might put all of your local MTA-related files in a single directory, with some of them being automatically generated and some of them hand edited. You might even mix in binaries used by scripts and so on.
(The automatically generated files might or might not be maintained under RCS; the hand edited ones definitely would be.)
In a whole-directory VCS, this is a bad idea; you want all of those non version controlled files to be in a completely separate directory somewhere that's outside of the repository; you want the repository to only have 'source', not 'compiled things'. Otherwise, as a minimum you're going to wind up with a bunch of file exclusions (and it's my opinion that file exclusions are fragile).
(Unlike with RCS, you want to do this even if you keep the automatically generated files under version control, since an automatic checkin of new versions of the generated files could easily collide with other work you're also doing in the repository.)
There's two approaches for dealing with this that I can think of offhand. You can have your overall configuration know that some files are found in one directory and other files are in another, or 'publish' files from the version control repository to the directory that has the automatically generated files. Of these, I prefer the two directories approach on the ground that it's more robust (you don't have to remember an extra magic step to make changes go live).
(By the way, if you are building such a system you almost certainly don't want to require changes to be committed to the repository in order to be activated. Doing 'check out current version' or the like is tempting, but it's going to make testing much harder.)
2009-11-23
Converting a directory from RCS to Mercurial
Suppose that you have a directory full of configuration files that have been there for so long that they're still being maintained with RCS. Further suppose that you would like to change to a modern version control system, say Mercurial, but that you would like to preserve all of your old version history.
Mercurial has no direct support for converting RCS files, but there's a magic trick: a CVS repository is nothing more than a bunch of RCS files in a directory hierarchy plus a thin layer of easily created metadata, and a lot of things (Mercurial included) can convert CVS repositories. So we first make a CVS repository version of our directory, and then convert that repository to Mercurial.
Before you start, you need to clean up your current data by making sure that everything you want to have included in the new repository is under RCS, and that you don't have any lingering RCS ,v files for files that you've taken out of service. If you do have old ,v files and want to preserve their history in the new repository, you'll need to remember to tell Mercurial (or your VCS of choice) that they're deleted after you finish the repository conversion.
(It's relatively common for us to remove the checked out version of a file but keep the ,v file both just in case and for historical purposes. You may be different.)
Using the example of a directory (or directory hierarchy) called
nsdata, here's the steps, in two parts. We'll work in /tmp/, for
convenience.
(As always, I must note appropriate disclaimers. You should always carefully test both procedures and end results, and while this has worked for us, I can't promise that it will work for you.)
Creating a CVS repository version of your RCS-controlled directory
- Create an empty CVS repository to get the CVS metadata:
cvs -d /tmp/scratch-CVS init - put a copy of the
nsdatadirectory into/tmp/scratch-CVS/nsdatawith the tool of your choice (I usedrsync, because I usersyncfor everything like this). In CVS terminology, this creates a repository module called 'nsdata'. - Turn it into a correctly laid out CVS repository. You've
probably got all of your RCS ,v files in
RCSsubdirectories, but CVS puts them directly in the directory that the checked-out file goes in. So you need to move all of the ,v files up one directory level, out of theirRCSsubdirectories:cd /tmp/scratch-CVS/
find nsdata -type d -name RCS -prune | while read r; do mv -i "$r"/* "$r/.."; rmdir "$r"; done - create a checked out version of the CVS repository:
mkdir /tmp/scratch-CO; cd /tmp/scratch-CO
cvs -d /tmp/scratch-CVS co nsdataThis is where the CVS module terminology becomes important; you are checking out the 'nsdata' module from your CVS repository, which creates a
/tmp/scratch-CO/nsdatadirectory hierarchy.
You should be able to diff -r this checked out CVS module against
your current directory and not see any significant differences. (Your
checked-out version will have CVS directories and not have RCS ones.)
If you prefer something besides Mercurial, you can now use the CVS-to-whatever tool of your choice. The rest of this entry is specific to the CVS-to-Mercurial conversion process.
Converting a CVS repository into a Mercurial one
Unfortunately, you're also going to want to do the conversion with the latest version of Mercurial (version 1.4 as of writing this), which may mean that you need to build it yourself. Old versions of Mercurial do a worse job of the conversion, and if they are sufficiently old, they actually don't do it correctly. Once you've converted the repository, you can use the normal system version of Mercurial to work on it.
So, the steps:
- optionally, go through your RCS history to find out all of the Unix
userids that have made RCS checkins, and create a file that maps
from the Unix userid to something more conventional for Mercurial,
such as an email address. See Mercurial's '
hg help convert' for information about the format of this file; let us assume that it is/tmp/authormap. - create a Mercurial version of your CVS repository:
hg convert --authors /tmp/authormap --datesort /tmp/scratch-CO/nsdata /tmp/nsdata-hgSome Mercurial documentation recommends avoiding
--datesort. This is wrong for our particular case; here, your changesets really are in strictly chronological order, and you want the converted repository to reflect this.If you doing the conversion with a self-built copy of the latest Mercurial on Ubuntu 8.04 LTS or any other system which has a pre-1.1 version of Mercurial, you will need to add an extra argument so that you can use the system version of Mercurial on the repository:
hg --config 'format.usefncache=0' convert ...(See here for a discussion of this.)
On Ubuntu 8.04 LTS you definitely want to use the latest Mercurial to do the conversion; Mercurial 0.9.5 has a bug that will give you incorrect file contents (reversing some changes) under some circumstances.
- clean up the repository and check out the current versions of
all files:
cd /tmp/nsdata-hg
hg purge; hg update
(If you did the conversion with a sufficiently modern version of
Mercurial, you don't need the 'hg purge'.)
The end result of this is a new Mercurial repository in /tmp/nsdata-hg
with the full history and the current version of all files in the
repository checked out. You should be able to diff -r this against
the current directory of configuration files and see no important
differences. (The Mercurial repository will have a .hg directory and
not have RCS directories.)
My experience is that the history of the Mercurial repository will show at least some multi-file changesets, although it doesn't seem to capture all of them. I choose to view this as an improvement over having all changes be single-file changes, even if it's not perfect.
(Presumably the conversion process (or CVS) uses various heuristics to decide when changes to multiple files more or less at once actually are a single changeset.)
Sidebar: resources and credits
I didn't come up with this on my own; a number of web pages provided very valuable information and pointers.
- From RCS to Mercurial
This (like a number of other old sources) uses Tailor, which is no longer necessary;
hg convertworks better. - Converting from RCS to Mercurial
gave me the
findincantation and some discussions of other things. - Official documentation for
hg convert
2009-11-22
RCS versus modern version control systems
Here is something that may shock people: we're still maintaining systems files with RCS. And I maintain that this is not as crazy as it sounds, once you dig under the surface, and that system administration is one of the last places where RCS is sensible some of the time.
For all their myriad benefits, the drawback of modern version control systems is that they really want to be used on whole directories (or directory hierarchies). This is generally pretty okay for source code, where you have directories that consist almost entirely of interrelated files, but system administration has lots of situations where either much of a directory's contents will not be managed by your version control, or the files that you are managing in your VCS are not related at all apart from being in the same directory on the same system.
(/etc is the canonical example of both situations. Yes, I know about
things like etckeeper; I
honestly think that they're uncomfortable hacks.)
For all its disadvantages (and it has significant ones), RCS's great
virtue is that it is a single-file version control system, one that
manages individual files instead of entire directories. Thus, it's
both easy and mindless to put just one file in some random directory
under version control. You don't have to set up an elaborate system or
remember to carefully sidestep much of what your VCS defaults to doing;
instead you can just do 'mkdir RCS; ci -l <file>' and not worry about
it.
(And there is the benefit of less taxonomy, in that you don't have to decide what
level of a directory hierarchy you should set up as the repository
root. Quick: do you make all of /etc a repository, or do you want
separate repositories for /etc/thing1, /etc/thing2, and so on?)
Sidebar: how I would get around the cluttered directory problem
What currently strikes me as the best solution is something that I saw
in the Mercurial documentation recently: just tell the repository to
ignore all files except those you've explicitly added. This basically
turns your VCS into an easier to use version of RCS, where the
equivalent of 'mkdir RCS' is to initialize a repository that ignores
everything by default.
This doesn't deal with all of the VCS problems, but it gets you part of the way.
2009-11-14
How to defer things in Exim
Normally, Exim routers will only accept or fail addresses (or be uninterested in them). This is good enough for normal handling of addresses, but if you are using routers to their full power, there are times when you want to force routers to defer addresses instead. There are two general ways to do this.
(Unsuccessful DNS lookups can cause addresses to defer, but this is not normally under your control.)
The straightforward way is to use a separate router to explicitly defer
the address using the :defer: action of the redirect driver, like
so:
defer_addr: driver = redirect allow_defer data = :defer:stalling [... whatever condition needed ...]
Using a separate router is straightforward and makes for clear log messages about what is going on. However, it's not always possible (or desirable) to use a separate router. In that case you can abuse string expansion to cause an expansion failure while expanding some option where this will force the router to defer.
This is moderately tricky for two reasons. First, you cannot just force
string expansion to fail explicitly (via an ${if} or the like),
because explicit failure doesn't wind up causing options to defer this
way; instead, the router generally fails or passes on the address. Only
'natural' expansion failure, for reasons that Exim thinks are outside of
your control, cause this failure. The one case that I know of is if you
use ${readfile} on a nonexistent file.
Second, you need to pick a router option where expansion failure
causes a deferral and, ideally, that you are not already using.
The Exim documentation is the final authority on what router
options will do for this (see generic options for routers
and check what each option does on non-forced expansion failure);
the one that I have found useful in our mailer configuration is
address_data. Thus, part of our deliver-to-/var/mail router
looks like:
postbox:
driver = accept
transport = local_delivery
# make sure it's mounted
address_data = ${readfile{/var/mail/.MOUNTED}}
[....]
(Our /var/mail is NFS mounted on the mail server, and obviously
we only want to do deliveries there if it is the real, NFS-mounted
filesystem, not the empty directory that's visible if the mount has
failed for some reason. .MOUNTED is just an empty file.)
The drawback of this approach is that Exim will log alarmed looking and rather cryptic error messages if the condition every fails and forces messages to be deferred, so it is best reserved for conditions that you don't expect to happen very often.
2009-11-13
(Ab)using Exim routers for their full power
Officially, as reflected in the documentation, Exim routers are expected to take more or less disjoint sets of addresses; for example, you have one router to do DNS lookups and SMTP for external addresses, one router to handle aliases, one router to expand the .forwards of people with them, and one router to deliver to people's mailboxes for people without .forwards. This makes the ordering of the routers relatively unimportant; approached this way, it is used mostly to make writing routers more convenient by having to be less neurotically careful about what addresses a router applies to.
(There is one exception; traditional .forward handling absolutely requires ordering and cannot be done with router conditions.)
If you want to really do powerful things with Exim routers, you need to go beyond this view. Instead, you should think of routers as (conditional) steps, or decision points, in a peculiar programming language. Not all decision points apply (or potentially apply) to all addresses, but it is entirely natural that multiple routers potentially apply (depending on circumstances) to the same set of addresses; each such router is a step on the conditional handling logic for these addresses.
(This mindset sounds simple when I explain it, but I don't think that it's obvious from the current Exim documentation. I've certainly seen a fair number of 'how to do X' questions asked on the Exim mailing list by people who clearly hadn't made this conceptual leap.)
Once you think of routers this way, ordering becomes important; for routers that handle the same set of addresses, the relative ordering of the routers is the ordering of decision steps about those addresses. Often you have something close to a total order of routers because you will want to do some common things with all addresses.
To make all of this less abstract, here is the list of decisions that our central mail system makes about external addresses, each implemented with a separate router:
- is this a locally generated bounce of a spam message? discard if so
- is this a looping bounce message? discard if so
- is all further handling of this address being manually deferred?
- if this is a spam message, has it exceeded the timeout interval for this address's domain? bounce if so
- route the address with DNS lookups and deliver the message via SMTP
(Some but not all of these also apply to internal addresses too.)
Sidebar: why .forward handling requires ordering
I cheated in the my example description of Exim routers. Traditional .forward semantics allow you to put your own email address in your .forward again; this means 'deliver to me, bypassing my .forward', which usually winds up putting a copy of the message in /var/mail. If you want to support these semantics under Exim, the router that delivers messages to /var/mail cannot apply only to people who do not have .forwards, and thus has to be ordered after the router that handles .forwards.
(How Exim makes these semantics work is a little bit complicated.)
2009-11-12
What makes Exim work as a mailer construction kit
In light of Postfix versus Exim, you might wonder what features make Exim into a mailer construction kit. For me, the easiest way to summarize the answer is to say that Exim has the idea of what I will call a user-written mail processing pipeline (actually two of them, sort of).
By a mail processing pipeline I mean a series of steps that messages go through to decide what will happen to them and how they will be delivered. In many MTAs, this processing pipeline is more or less fixed, with you having opportunities to add a table lookup here or mangle addresses there. In Exim, there is no fixed processing pipeline; you write it entirely from scratch yourself, using relatively generic components to do most of the work. The result is that you have a great deal of flexibility in what happens in those pipelines; in other words, how messages get handled and delivered is to a large extent under your direct control.
(The two drawbacks of this are that you have to write the pipeline yourself and that it is much easier to screw things up in various ways, some of them subtle.)
Conceptually, Exim has two major places with this sort of processing flexibility. The first is deciding how to route an address to one or more delivery destinations; you write a series of what Exim calls 'routers', and then they get used in sequence to process each address in various ways, hopefully ultimately delivering them somewhere.
(The Exim documentation describes routers and this routing process in a way that makes it sound less powerful than it is.)
The other major place with such a processing pipeline is deciding what reply code to give for each SMTP command in a SMTP conversation. In Exim you do this by writing a series of ACL rules for each command, again using relatively generic components to do most of the hard work. These rules can do quite powerful and generic things, and the combination can be quite powerful.
(Exim also gets a fair bit of its general power from its crazy string expansion language; this comes up when writing both routers and SMTP ACL rules.)