The Version Control System dependency problem

August 30, 2005

One of the aphorisms of HCI design is that if users keep making the same error with a program, it's the program and not the users that's actually wrong. By this standard, almost all version control systems really need an HCI makeover, because there is one classical mistake that users keep making over and over.

The mistake: make a change, then make an unrelated change, then make a third unrelated change (perhaps fix three different and independent bugs in entirely different files). You get a version diagram that looks like:

Original -> B -> C -> D

Now try to pass just the third change to someone, using the version control system. Almost all version control systems will refuse, saying that the C to D change depends on C (which depends on B), and you can't pass D without its dependencies.

(The 'proper' way to do this is that you should have put each change in a separate branch (or equivalent), then merged them together to get something you can test.)

But these are unrelated changes and the VCS is wrong, because most VCSes have adopted an extremely simplistic idea of dependency: if C comes after B in the same branch, it depends on B. In turn, this simplistic idea clashes with how normal people try to use VCSes before they get inoculated with the version control religion.

And this is the VCS dependency problem: VCSes are too strict about 'dependencies', so much so that it gets in people's way. Clinging to mathematic purity may have a clean intellectual appeal, but it comes at the expense of practical usability.

In practice it also comes at a cost of damaging the integrity of the codebase's history. People will make these kinds of mistakes and have to fix them; the only question is whether they will be able to do so inside the VCS or whether they will resort to exporting and importing patches and the like (with the loss of some of the development history).


Comments on this page:

From 69.113.211.148 at 2009-12-02 08:37:17:

This is the same answer that you get working on a big source tree -- the answer is diligence, and attention paid to doing things the right way. For those of us who run small, understaffed or single-person administration teams, this means keeping a logical separation between your local repositories and your configuration management repositories, with special attention paid to the differing roles of each.

When I work on a local machine's repository, I'm explicitly not considering the ability to branch, merge and collect patches, which just introduces more management overhead I'm not interested in. That work is all done in the Puppet modules on the configuration management server, which is where all wide and global changes are developed. What I am focused on instead is the ability to group related changes into a changeset (which is very doable as long as you're diligent about committing often and separating out files into real commits), the ability to roll back the whole environment to a known-good configuration, and the ability to aggregate summaries of all changes and have them displayed in a central location (for which we currently use Redmine and a big set of subprojects). In other words, my local machine changes are explicitly concerned with agile change control and nothing else.

You've got a good point that it's hard to roll back one set of changes and not all others. However, in practice, with adequate testing in a development environment, this shouldn't come up often outside of an emergency situation. By using revision control for tracking changes, it should become very easy to leverage your repository browser to see exactly what broke the service you're concerned with. If you have to manually roll back that change, it should be simple to figure out what needs to be done -- you can pinpoint the prior revision of all affected files, and check out only those files from a particular past commit.

System administration and release engineering have very different roles, and there's nothing wrong with having a single product that's used in very different ways to accomplish very different goals. Revision control systems are very complex, flexible and robust pieces of software, and the way they're used should be optimized to fit the workflow of wherever they're needed. You should, wherever possible, fit the tool to the process, rather than the other way around.

--Jeff

By cks at 2009-12-03 00:10:48:

A site admin note: this comment is probably more associated with another entry than with this one, but I have decided not to use magic powers to relocate it there, since the original author did deliberately post it here.

To the extent that the comment addresses issues I talked about in this entry, I disagree with its view for reasons summarized in the first paragraph of the entry; if people keep making a mistake with your software, the software is wrong. If it takes extraordinary care to use version control software 'right', the version control software is correct only in a mathematical sense.

Written on 30 August 2005.
« Python's dangerous automatic Unicode conversions
LILO vs GRUB: why LILO still wins for servers »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Tue Aug 30 03:21:13 2005
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.