Why open source needs distributed version control

August 10, 2005

Centralized versus distributed version control is one of the big discussion topics in the field. Being distributed complicates the software, and CVS and Subversion (the two version control systems most widely used for open source) are both centralized. Recently, Ian Bicking has written some interesting articles on the issue, in Centralized vs Decentralized and its part 2.

I believe that open source development needs distributed version control. The argument why this is necessary is a bit too long for a comment on Ian Bicking's articles (and besides, this way I have a real editor), so I'm putting it here.

Unless you have a small or really peculiar open source project, you can't give everyone who wants to do non-trivial development core commit access. However, you still want all those people to be using version control, and it needs to let them share their work with collaborators, testers, critics, and so on.

If you do let them piggyback off your project's core version control environment, what they need translates to publicly visible branches (ideally with access control under their control). To use the version control system for their work, a developer obtains a new branch and gets to commit on the branch. (If you don't let them piggyback off the core, you get a de facto 'distributed' (also anarchic and uncontrolled) version control 'system'.)

As Ian Bicking has noted, there is no technical reason that a centralized version control system can't support this; it's just that nothing does today.

However, this still leaves at least two big questions: who can create branches, and when do branches get deleted? (You probably don't want to answer 'anyone' and 'never'.)

These are not technical questions, these are questions of policy and thus of politics. This means that someone or some group has to play some form of central gatekeeper (and the people involved need to be respected, ie good developers). As it involves politics, this is not going to be a light, pleasant job. Also, the more developers you get, the more branches and branch issues the central gatekeeper has to deal with, the more grief these people catch, the less time they have to do actual development, and so on.

With a distributed version control system, branching is distributed. A developer that wants to branch just does it locally. If they want to share it, they do; if they want to give someone access, they do; if they want to keep the branch around or kill it, they do. All the management overhead of branches falls on the heads of the people actually doing the branching; none of it falls on anyone not involved.

Thus: a distributed system scales branching excellently as you add more developers, unlike a centralized one.

This is why I believe open source fundamentally needs distributed version control systems.

(And looking at the length and somewhat lack of coherence of this, it's a good thing I didn't try to make it a comment in Ian Bicking's blog.)

Sidebar: centralized resource requirements

Centralized version control systems also have the disadvantage that they centralize resource requirements, because everyone has to talk to the master server to do anything (at least anything that involves writes). This also makes the central server a crucial resource; if it's not there, no one can do any version control work.

This means that the resources required to run a central server acceptably keep rising as the number of developers rises. If your project gets popular and you acquire hundreds of new developers eager to help, you may not be able to accommodate them (at least right away). Oops.

I don't consider this a real issue, since so far it's always been possible to buy beefier servers. But I do think it's something to keep in mind as one downside of the centralized approach. (On the other hand, centralization has its advantage, like wider visibility of branches and development. (Indeed some distributed VC based projects, like the Linux kernel, create their own centralization point for people to find branches in one spot.))

Written on 10 August 2005.
« My first comment spam
An aphorism of system administration »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Aug 10 01:55:56 2005
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.