Converting a directory from RCS to Mercurial

November 23, 2009

Suppose that you have a directory full of configuration files that have been there for so long that they're still being maintained with RCS. Further suppose that you would like to change to a modern version control system, say Mercurial, but that you would like to preserve all of your old version history.

Mercurial has no direct support for converting RCS files, but there's a magic trick: a CVS repository is nothing more than a bunch of RCS files in a directory hierarchy plus a thin layer of easily created metadata, and a lot of things (Mercurial included) can convert CVS repositories. So we first make a CVS repository version of our directory, and then convert that repository to Mercurial.

Before you start, you need to clean up your current data by making sure that everything you want to have included in the new repository is under RCS, and that you don't have any lingering RCS ,v files for files that you've taken out of service. If you do have old ,v files and want to preserve their history in the new repository, you'll need to remember to tell Mercurial (or your VCS of choice) that they're deleted after you finish the repository conversion.

(It's relatively common for us to remove the checked out version of a file but keep the ,v file both just in case and for historical purposes. You may be different.)

Using the example of a directory (or directory hierarchy) called nsdata, here's the steps, in two parts. We'll work in /tmp/, for convenience.

(As always, I must note appropriate disclaimers. You should always carefully test both procedures and end results, and while this has worked for us, I can't promise that it will work for you.)

Creating a CVS repository version of your RCS-controlled directory

  1. Create an empty CVS repository to get the CVS metadata:
    cvs -d /tmp/scratch-CVS init

  2. put a copy of the nsdata directory into /tmp/scratch-CVS/nsdata with the tool of your choice (I used rsync, because I use rsync for everything like this). In CVS terminology, this creates a repository module called 'nsdata'.

  3. Turn it into a correctly laid out CVS repository. You've probably got all of your RCS ,v files in RCS subdirectories, but CVS puts them directly in the directory that the checked-out file goes in. So you need to move all of the ,v files up one directory level, out of their RCS subdirectories:
    cd /tmp/scratch-CVS/
    find nsdata -type d -name RCS -prune | while read r; do mv -i "$r"/* "$r/.."; rmdir "$r"; done

  4. create a checked out version of the CVS repository:
    mkdir /tmp/scratch-CO; cd /tmp/scratch-CO
    cvs -d /tmp/scratch-CVS co nsdata

    This is where the CVS module terminology becomes important; you are checking out the 'nsdata' module from your CVS repository, which creates a /tmp/scratch-CO/nsdata directory hierarchy.

You should be able to diff -r this checked out CVS module against your current directory and not see any significant differences. (Your checked-out version will have CVS directories and not have RCS ones.)

If you prefer something besides Mercurial, you can now use the CVS-to-whatever tool of your choice. The rest of this entry is specific to the CVS-to-Mercurial conversion process.

Converting a CVS repository into a Mercurial one

Unfortunately, you're also going to want to do the conversion with the latest version of Mercurial (version 1.4 as of writing this), which may mean that you need to build it yourself. Old versions of Mercurial do a worse job of the conversion, and if they are sufficiently old, they actually don't do it correctly. Once you've converted the repository, you can use the normal system version of Mercurial to work on it.

So, the steps:

  1. optionally, go through your RCS history to find out all of the Unix userids that have made RCS checkins, and create a file that maps from the Unix userid to something more conventional for Mercurial, such as an email address. See Mercurial's 'hg help convert' for information about the format of this file; let us assume that it is /tmp/authormap.

  2. create a Mercurial version of your CVS repository:
    hg convert --authors /tmp/authormap --datesort /tmp/scratch-CO/nsdata /tmp/nsdata-hg

    Some Mercurial documentation recommends avoiding --datesort. This is wrong for our particular case; here, your changesets really are in strictly chronological order, and you want the converted repository to reflect this.

    If you doing the conversion with a self-built copy of the latest Mercurial on Ubuntu 8.04 LTS or any other system which has a pre-1.1 version of Mercurial, you will need to add an extra argument so that you can use the system version of Mercurial on the repository:

    hg --config 'format.usefncache=0' convert ...

    (See here for a discussion of this.)

    On Ubuntu 8.04 LTS you definitely want to use the latest Mercurial to do the conversion; Mercurial 0.9.5 has a bug that will give you incorrect file contents (reversing some changes) under some circumstances.

  3. clean up the repository and check out the current versions of all files:
    cd /tmp/nsdata-hg
    hg purge; hg update

(If you did the conversion with a sufficiently modern version of Mercurial, you don't need the 'hg purge'.)

The end result of this is a new Mercurial repository in /tmp/nsdata-hg with the full history and the current version of all files in the repository checked out. You should be able to diff -r this against the current directory of configuration files and see no important differences. (The Mercurial repository will have a .hg directory and not have RCS directories.)

My experience is that the history of the Mercurial repository will show at least some multi-file changesets, although it doesn't seem to capture all of them. I choose to view this as an improvement over having all changes be single-file changes, even if it's not perfect.

(Presumably the conversion process (or CVS) uses various heuristics to decide when changes to multiple files more or less at once actually are a single changeset.)

Sidebar: resources and credits

I didn't come up with this on my own; a number of web pages provided very valuable information and pointers.


Written on 23 November 2009.
« My current unhappy thoughts on Fedora 12
An important lesson for me on Fedora upgrades »

These are my WanderingThoughts
(About the blog)

Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.
Twitter: @thatcks

* * *

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

This is a DWiki.
GettingAround
(Help)

Search:

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Nov 23 23:39:46 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.