Yes, git add makes a difference (no matter what people think)

May 28, 2012

One of the things said about git is that it's less user friendly and takes longer to learn than Mercurial; the first exhibit for this difference is usually git add and by extension git's index. Unfortunately, a common reaction among git fans to both the general issue and git add in specific is a kind of defensive denial, where they hold forth that it's not that difficult and people learn it fine and really, git is user friendly.

You may already have gotten an idea of my views on this. I'm here to tell you, from a mostly outsider perspective, that git add really does make a real difference in initial user friendliness, one that makes Mercurial easier to pick up and use for straightforward uses.

(I've used git to a certain extent, for example for my Github stuff, but I am not up to the experienced user level. I'm not really at that level with Mercurial either, partly because I haven't needed to be and partly because I'd rather learn git; Mercurial is easier but I like git more.)

Before people freak out too much, let me be explicit: all of this is about initial user friendliness, the ease of doing straightforward things and picking up the system. In the long run I think that the git index is the right design decision (for a programmer focused VCS) because it creates an explicit model for doing a number of important but tricky things, a model that can be manipulated and inspected and reasoned about, and once you learn git and use it regularly dealing with the index becomes second nature. But people generally do not defend the index in these terms; instead, they try to maintain with a straight face that it's no real problem for people even at the start.

(If you think that the index does not cause problems for git beginners, I would gently suggest that you trawl through some places where they ask questions.)

The usability problem with git add is not just the need for git add itself as an extra step, it is that the existence of the index has additional consequences that ripple through to using other bits of git. For example, let us take the case of the disappearing diff:

; git diff a
[...]
-hi there
+hi there, jim
; git add a
; git diff
;

If you already know git you know what's going on here (and you're going to reach for 'git diff --cached'). If you're learning git, well, your change just disappeared. Of course this happens the other way around too; 'git diff' shows you nice diffs, then you do 'git commit' and it tells you nothing to commit. Wait, what? The diffs are right there.

(There's worse bear traps in the woods for beginners, too, like doing a 'git add' and then further editing the file. Here 'git diff' will show you a diff but it is not what will be committed.)

All of this is a cognitive burden. When you use git, you have to learn and remember the existence of the index and how this affects what you do, and you probably need to take extra steps or pay extra attention to what 'git commit' and so on tell you. This cognitive burden is real, although it can (and will be) overcome with familiarity and what it enables has important benefits. It is a mistake and a lie to try to pretend otherwise. Honesty in git advocacy is to say straightforwardly that the index is worth it in the end (possibly unless you have simple work patterns).

(A system where the index or its equivalent is an advanced feature, one not exposed by default, really does have a simpler initial workflow. If it's designed competently (and Mercurial is), everything 'just works' the way you expect; hg commit commits what hg diff shows you and so on. In real life this makes a difference to people's initial acceptance of a new VCS, especially if the simple workflow is adequate for almost everything you'll ever do with the system. This is not true of the sort of advanced VCS use that programmers can practice routinely, but it can be of other VCS uses.)

Sidebar: the problem with 'git commit -a'

At this point some people may come out of the woodwork to tell me about git commit -a, or even about creating an alias like 'git checkin' that always forces -a. There are two pragmatic problems with this.

First, the index still exists even if you're trying to pretend otherwise. This means that you can accidentally use the index; you can run git add because something said to, or you can run straight git commit, and so on. All of these will create confusion and cause git to do what (to you) looks like the wrong thing.

(In fact you have to run git add every so often, to add new files.)

Second, it is not at all obvious from simply reading documentation that using git commit -a is a fully reliable way of transmuting git into Mercurial. Maybe it is, maybe it isn't, but as a beginner you don't know (not without doing more research than I myself have done). Because many git operations are fundamentally built around the existence of the index, the safest assumption to make is that the index really does matter and git commit -a is probably an incomplete workaround.

(For example, at the point where you do git add to add a new file you'll become familiar with git diff HEAD in order to get the true diffs for what will be committed when you run git commit -a, which I hope illustrates my point adequately. And maybe there's a better command for doing that, which also illustrates my point because git diff HEAD is what I came up with as a relative git novice.)


Comments on this page:

From 78.86.151.9 at 2012-05-29 03:18:53:

My advice for people new to git is always to do commits with 'git gui' for a while. The UI of 'git gui' makes the purpose of the index obvious, and its use trivial.

(There may be platforms where 'git gui' is a pain to install. That's not the case on Linux, although distros tend to package is separately from the main git package, so you need to know about it to install it.)

From 195.26.247.141 at 2012-05-29 04:15:04:

git diff shows no changes after add

To be honest, I've never found this to be a problem, and you can always do "git status" to see what is/isn't going to be committed.

With Mercurial I assume that commit (by default) always says to commit all changes. So if you regularly want to commit only a subset of the changes then you'll need an extra step, or to stash them elsewhere? (I've not used it)

By cks at 2012-05-29 06:14:20:

Mercurial does git-like partial commits through a (standard) extension, the record extension. This seems to be roughly equivalent to 'git commit --interactive', with the same drawback that you have to do it on the fly and can't prepare and check it in advance.

(I suppose the Mercurial way is to make the commit in a private repo, possibly clone the repo to check out the state of everything, and then roll the commit back if it wasn't good.)

By cks at 2012-05-29 06:35:19:

It's probably worth being explicit about one reason why Mercurial-style handling is easier at the start: in Mercurial, the diffs that 'hg diff' shows you are always exactly the changes that 'hg commit' will commit. This leads to a simple workflow:

; hg diff
[silent, so there's no uncommitted changes]
; vi file1 file2 file3
; hg diff
[test as necessary, possibly repeat]
; hg commit

It's not just that this workflow has a step or two less than the git equivalent; it's also that this workflow is tolerant of interruptions at any point without having to remember where you were in the various steps and then potentially repair or complete index manipulations. The git pre-commit check requires multiple commands and is more intricate (especially if you get interrupted or have to change things partway through).

(I think the thorough version would be 'git status; git diff; git add ...; git commit' all as one logical action without interruptions, assuming that you are starting with nothing added to the index. You don't want to do git add until right before you commit so that you can keep using the simple form of git diff to see what you'll commit.)

From 87.79.236.202 at 2012-05-29 07:38:44:

I’m trying to think of what I do and finding myself not being able to say, until I realise that what I do most often is to actually commit “blindly” and then use git show (aliased git v) to check if the commit was what I expected. That makes sense if you consider that how trivially easy Git makes it to throw away a mistaken commit and do it over.

If I am composing a very complex commit then I’ll use I use git diff --cached (which I’ve aliased as git di, to go with my git d alias for git diff).

Otherwise it really depends. Generally with the use of git status etc as part of the work in general, I’ll have an accurate gut feel for what I’ll be committing, so it feels natural not to be meticulously cautious up front but to just do a quick double-check after the fact.

It amuses me to realise that this is just another aspect of the same theme that for me permeates the use of Git at every level – that it frees me to cruise along instead of planning ahead, by letting me act first and decide later, and letting me back up a step or three when necessary.

The index, of course, is just another part of that.

So during the first paragraph of your article I was poised to argue, but then I quickly agreed as I read on. There absolutely is more to learn with Git than other VCSs due to the index. And as you predicted, I will argue that it is completely worth the cost. I wouldn’t want to work without it any more.

(Does Mercurial have a way to amend the last commit to add or remove bits, and a way to throw away the last commit without affecting the working copy? Together with record that would allow it to get very close to offering the functionality of the index, if with a more awkward interface.)

As for git commit -a – no, this sure won’t turn Git into an index-less system. One big reason is that there is no way whatsoever around using the index for resolving merge conflicts. It does often allows skipping an explicit step in practice, though.

More importantly, however: when it comes to an absolute novice, particularly someone who has never used any VCS before, the -a switch is a very useful didactic tool, as it allows the lessons in teaching Git to be staggered. The index and the extra steps it requires can be postponed until after the novice has a basic familiarity with the system. They will have to learn the index very soon, there is no way around that. But the -a switch can help reduce the amount of detail they have to learn all at once.

The other commenter’s suggestion to start with git gui is also great, though. I had not really thought of that. Making implicit invisible state explicit and visible is always a great way to make abstractions tractable for human brains. (We have an amazing aptitude for forming unconscious abstractions in order to manipulate concrete entities; we’re not near as capable at reasoning over abstractions consciously.)

Aristotle Pagaltzis

From 193.62.202.242 at 2012-05-30 04:55:43:

I still miss the way Perforce handled this. (I thought Perforce was very elegant in general, but I suppose I'll never use it again now that free software has caught up with it -- i.e., the DVCSes, I wouldn't say Subversion ever caught up.)

Edited files are added to the default changelist, but you can create additional pending changelists and move opened files between them. Then p4 submit commits the default changelist, while p4 submit -c NUMBER commits the specified changelist.

So by default you get the add-less Mercurial model; but if you are working on several things at once, you can have the equivalent of multiple indices, each with their description/commit message attached while you're working on them.

It's not as general as the git model (changed files as a whole are listed in changelists, so simultaneous disjoint works-in-progress have to involve disjoint sets of files), but it's a very simple and effective way of tracking multiple items of work in a single checked-out tree.

-- John

From 31.101.48.219 at 2012-05-30 06:23:23:

I have to agree. It took me quite a while to grok the point of the index. I now think that it is a major feature of git and as you say quite a difference from other version control systems. I think that any tutorial about git should really start with trying to explain an index based workflow rather than trying to mimic other systems.

From 204.197.193.163 at 2012-05-30 13:31:21:

The very first thing anyone should learn about using git is to run git status early and often. When learning git, the behavior of git diff initially struck me as a bit odd but because I was in the habit of typing git status often, I never thought anything had disappeared.

Written on 28 May 2012.
« How to do a very cautious LVM storage migration
What it means for an OS to succeed or fail »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon May 28 23:34:04 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.