The mixed directory/unrelated files VCS problem

December 2, 2009

To follow up an earlier entry, one might sensibly ask what the problem is with using a modern, whole-directory version control system on a directory full of unrelated files, the classical example being /etc.

What you effectively have in this situation is a directory with multiple 'modules' (files and groups of files) that are logically separate and independent from each other. As they're separate modules, these files usually evolve and are developed at least somewhat independently. Putting all of these files into a single repository creates a mess in the same way that developing several different bugfixes at the same time in the same source code repository does.

As with source code, what you wind up with is a repository that can give you time-based snapshots of the state of your directory, but doesn't tell you very much about the logic of development of various things, at least not in a straightforward way. (You can sort of reconstruct it by restricting various repository operations to just a subset of the files, but you are working outside of things.)

This mess gets worse if you need to synchronize changes across multiple machines. Here, you really have the classic entangled changesets problem, especially since not all 'modules' may apply to a particular system. Updating a system to the current state of, say, your overall NTP configuration becomes a non-trivial operation because you're fighting how the VCS wants you to work.

With some VCSes (eg, git) I think you could cheat madly and sort of make this work, but I also think that you'd wind up with a lot of heartburn. With others, I don't see how to make this work short of having completely separate repositories for each machine and just shipping patches around by hand.

(Okay, I suppose there are various cherry-picking and transplanting extensions for various VCSes, but I suspect that significant use of them will wind up with an increasingly horribly tangled repository history.)

Comments on this page:

From at 2009-12-02 10:59:23:

This is precisely the sort of problem which configuration management systems are made to solve for you -- you have the separate modules (apache, bind, whatever) and put a config for them on some remote server. Then you version that set of configs, and apply only those which are appropriate for a particular machine instance.

At least then you'll be able to pick which modules you get on the machine.

For cherry-picking, you could use tags or branches to ensure that certain machines get certain sets of the files, but that gets messier than is usually necessary (well, I've not had to do this yet, but I like to keep all the configuration-management-managed machines in sync as much as possible).

By cks at 2009-12-03 00:29:43:

In my view, you would have similar problems if you put all of the configuration management system's files in the same repository. With current VCSes, if you want separate modules you need separate repositories and that means you need separate directories.

Some VCSes at least have explicit support for sub-repositories, so you could incorporate all of these separate module repositories into a master 'configuration system' repository and still be able to snapshot particular moments of CM system state.

(Mercurial has subrepos, which are still considered experimental. Git has submodules, which are fully supported. SVN is of course the ancestor of all of this, with svn:external.)

From at 2009-12-03 05:01:31:

If you can abstract things enough to generate the config files using the configuration management system then you only have to version the generation of the configs (per machine), rather than the config files themselves.

At worst you can template them, but this still means that you'd need different template versions when things really change between versions of the software.

This works for at least Apache, Bind, DHCPd and various other ones with a few Puppet recipes, and it's not too tricky to add more :

From at 2009-12-03 05:10:14:

One thing I was wondering is exactly what problems you've actually been having and trying to solve in this way to lead you to the original post?

Written on 02 December 2009.
« Using content hashing to avoid the double post problem
The problem with the OpenSolaris source repository »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Dec 2 01:48:00 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.