Two different worldviews of version control systems

November 4, 2021

I've come to think that there are two broad ways of viewing the world that are used by most common version control systems. Although the end result can be the same, these worldviews lead to different places and can give people different attitudes, and I happen to think that one is a better representation of reality than the other.

In the first worldview, the version control system manages snapshots of a tree. When you make a new snapshot, you generally provide a freeform textual description of how it relates to one or more existing snapshots. Naturally, the VCS provides a variety of convenient tools for creating a new tree state from existing snapshots, but these are in a deep sense just chrome; the VCS's core view is mostly indifferent to how a tree state comes to be. Git is the classical and practically canonical example of this worldview.

(This worldview expresses an abstract situation. The actual underlying storage of trees may use various forms of deltas.)

In the second worldview, the version control system manages a series of changes from an initial state. Additional states are at least conceptually reconstructed by replaying the changes. The VCS usually tries to be aware of specific semantic changes, like renaming files, and new changes can be created through VCS operations in addition to by manually editing files. VCSes of this nature usually consider it important to capture the intent of tree changes (hence their explicit support for file renaming) and sometimes talk about a patch algebra, which can be badly summarized as a formalization of logic around series of changes ('patches').

(As with the other VCS worldview, the underlying storage format is not necessarily a series of changes from an initial state.)

Both worldviews work, in the sense that they can create the same set of tree states in the same order, with the same textual description of each step forward (ie, each commit message). But my view is that the change worldview is fundamentally a pretense because we can't truly describe changes properly. Our primary tool for describing most changes is the text diff, but that's merely the implementation of a change, not the change itself. We can describe the change in the commit message, but again that's not the change itself.

The snapshot worldview is honest about what the version control system can really provide and what it can't. The VCS can provide snapshots, and it can generate text diffs and other analysis of the difference between two snapshots. But it doesn't claim to capture changes themselves, only their consequences in the circumstances they were made.

PS: The snapshot worldview doesn't preclude an idea of (named) 'branches', because branches can exist at the level of commits. A commit is one sort of metadata about the relationship of a snapshot with other snapshots, and this can include things such as 'this was made on branch X' if the VCS wants it to.

(And even if the VCS doesn't include it, people writing commit messages can.)


Comments on this page:

Good point. I’ve noticed this with API change patches in particular, which are often supposed to be of a “change every place that did things the old way to do it the new way” nature. After any rebase – or merge! –, I don’t know if the other lineage didn’t introduce new places that do things the old way, so the API change patch may no longer be comprehensive, and therefore no longer correct. And the VCS cannot help me at all with that. I have to manually redo whatever process I used to find the places that needed to be updated before, and see if it now turns up any new hits. And then update them accordingly, of course – equally without VCS help. If it is really a purely mechanical transformation, the VCS could help – but it isn’t always.

But I hadn’t made the connection that this is the same issue as rename tracking in disguise.

By sam at 2021-11-05 06:49:18:

If you do have a good formalism for diffs, the second worldview can make sense, because you can have formal laws for how to manipulate diffs and what properties they have - this is what Pijul does, and (AIUI) it means merges can be done more cleverly than Git can because of these additional properties.

I notice you didn't give a specific VCS as an example of the second worldview. Did you have one in mind?

By cks at 2021-11-05 15:43:08:

I haven't looked enough at non-Git mainstream VCSes to feel I can speak confidently for their worldviews. For example, Mercurial certainly feels like it's change-focused to me, but I'm not sure that Mercurial people would fully agree with that characterization. There are less mainstream VCSes that seem to put themselves forward this way, such as Pijul and especially Darcs, which explicitly describes itself as "focus[ed] on changes rather than snapshots".

By Walex at 2021-11-09 17:03:40:

«I've come to think that there are two broad ways of viewing the world that are used by most common version control systems.»

That is a bit of a late realization, but the "snapshot" systems like 'git' are not really Version Control Systems, but Content Management System (sometimes less generally called Source Code Managers), because they focus on content updates rather than file (name or line) changes.

The "patches" systems like SCCS/RCS and successors are the ones that can be properly called Version Control systems.

«If you do have a good formalism for diffs, the second worldview can make sense, because you can have formal laws for how to manipulate diffs and what properties they have»

That the old and interesting view of "Arch" view of "Patch Algebra Theory"

«means merges can be done more cleverly than Git can because of these additional properties.»

But that is not the goal of 'git', which is to do curation of content rather than managing changes to files. The most distinctive goal of 'git' was to enable tracking of content (and its authorship) irrespective of files and file versions, for better handling of copyright issues within the Linux kernel.

«For example, Mercurial certainly feels like it's change-focused to me [...] Darcs, which explicitly describes itself as "focus[ed] on changes rather than snapshots"»

The two categories recently rediscovered by our blogger, VCSes and CMSes, differ rather than in "changes" (patches) vs. "snapshots" in two other deeper points (related to some filesystem implementations,..):

  • VCSes like SCCS/RCS, CVS/SVN, Mercurial or DARCS focus on actions and files (and have a very strong notion of "branch"), and almost ignore content and states.
  • CMSes like 'git' (and very few other tools) focus on states and content (and don't have an intrinsic notion of "branch") and almost ignore actions and files.

So VCSes track actions on files, lineages as explicit sequences of patches; CMSes track content across files, lineages as implied by references among states.

That is not just how their storage systems are implemented, but pervade their usage philosophy.

Note: one of the tragedies of 'git' is that while it is truly fundamentally a CMS, its "porcelain" layer tries to make it look like a VCS; I have found supporting 'git' users that leads to big misunderstandings and limitations in 'git' use.

Note: amusingly, a very good tool like Bazaar/Breezy, which is truly fundamentally a VCS, focused on tracking histories of file operations including renames, can also operate transparently. as a 'git' "porcelain" layer, on the '.git' storage layer, which is fundamentally designed for CMS operation.

By Walex at 2021-11-09 17:09:59:

«I’ve noticed this with API change patches in particular, which are often supposed to be of a “change every place that did things the old way to do it the new way” nature.»

Ah, that is a completely different topic, it is an example of what are called "cross cutting concerns" (which define what once upon a time was called "aspect oriented programming" as opposed to "object oriented programming"), and as you noted it does not mesh well with text-oriented tools, whether VCSes or CMSes.

But that opens the related topic of what is a "decomposition paradigm" and why they are important and i n what they differ, and most programmers (and their managers and the investors who pay their wages) only care about hacking it until it sort-of works, so it is a long-lost area of discussion.

By Walex at 2021-11-10 03:27:30:

«the goal of 'git', which is to do curation of content rather than managing changes to files [...] to enable tracking of content (and its authorship)»

As to that I would like to add as the core workflow (for curating the Linux kernel) commands git add -i and git rebase -i, the git blame command.

Note: one of the goals of 'git' as to copyright management was to allow the easy deletion of contributions by specific authors. Dr. Torvalds reasoned that in case of copyright (or patent) disputes for parts of the Linux source the easiest first step would be removal; that was the original purpose of git blame (and relatedly of signed commits, and the pervasive and very useful distinction between author and committer).

By Walex at 2021-11-10 04:39:47:

«The snapshot worldview doesn't preclude an idea of (named) 'branches', because branches can exist at the level of commits. A commit is one sort of metadata about the relationship of a snapshot with other snapshots»

That 'branches' seems to be between "air quotes" and that is appropriate, and I mostly agree with that, as tags and links among commits imply (but merely imply) some sort of "branches".

But because each snapshot is fully self-contained, there is no real notion of "branch" in the sense of a series of related changes; indeed the "branches" in 'git' are just "symbolic links" to a commit, "lineages" of commits can be nameless, and they can have DAG shapes (and all this can lead to enormous confusions for those who try to use 'git' as if it were a VCS with branches defines by sequences of patches).

From a certain point of view it is possible to consider each snapshot in 'git' is a self-standing "branch" of the project, and the links among commits to be relationships among "branches". That's why in previous comments I have used the fuzzier term "lineage".

Written on 04 November 2021.
« Thinking through the threat models when encrypting your backups
If we use PyPy, we'll likely use our own install of it »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Nov 4 23:48:36 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.