Understanding Git's model versus understanding its magic

April 27, 2017

In a comment on my entry on coming to a better understanding of what git rebase does, Ricky suggested I might find Understanding Git Conceptually to be of interest. This provides me with an opportunity to talk about what I think my problem with mastering Git is.

It's worth quoting Charles Duan here:

The conclusion I draw from this is that you can only really use Git if you understand how Git works. Merely memorizing which commands you should run at what times will work in the short run, but it’s only a matter of time before you get stuck or, worse, break something.

I actually feel that I have a relatively good grasp of the technical underpinnings of Git, what many people would call 'how Git works'. To wave my hands a bit, Git is a content addressable store that is used to create snapshots of trees, which are then threaded together in a sequence with commits, and so on and so forth. This lets me nod and go 'of course' about any number of apparently paradoxical things, such as git repositories with multiple initial commits. I don't particularly have this understanding because I worked for it; instead, I mostly have it because I happened to be standing around in the right place at the right time to see Git in its early days.

(There are bits of git that I understand less about the technicalities, like the index. I have probably read a description of the guts of the index at least a few times, but I couldn't tell you off the top of my head how even a simple version of the index works at a mechanical level. It turns out to be covered in this StackOverflow answer; the short version is that the index is a composite of a directory file and a bunch of normal object blobs.)

But in practice Git layers a great deal of magic on top of this technical model of its inner workings. Branches are references to commits (ie, heads) and git advances the reference when you make commits under the right circumstances; simple. Except that some branches have 'upstreams' and are 'remote tracking branches' and so on. All of these pieces of magic are not intrinsic to the technical model (partly because the technical model is a strictly local one), but they are very important for working with Git in many real situations.

It is this magic that I haven't mastered and internalized. For example, I understand what 'git fetch' does to your repository, and I can see why you would want it to update certain branch references so you can find the newly imported commits. But I have to think about why 'git fetch' will update certain branches and not others, and I don't know off the top of my head the settings that control this or how you change them.

It's possible that Git has general patterns in this sort of magic, the way it has general patterns at its technical level. If it does, I have not yet understood enough of the magic to have noticed the general patterns. My personal suspicion is that general patterns do not necessarily exist at this layer, because the commands and operations I think of as part of this layer are actually things that have accreted into Git over time and were written by different people.

(At one point Git had a split between 'porcelain' and 'plumbing', where porcelain was the convenient user interface and was at least partially developed by different people than the core 'plumbing'. And bits of porcelain were developed by different people who had their own mental models for how their particular operation should behave, with git rebase's lack of an option for the branch name of the result being an example.)

In a way my understanding of Git's internals has probably held me back with Git in general, because it's helped to encouraged me to have a lackadaisical attitude about learning Git in general. The result is that I make little surgical strikes on manpages and problems, and once I feel I've solved them well enough I go away again. In this I've been mirroring one of the two ways that I approach new programming languages. I've likely reached the point in Git where I should switch over to thoroughly slogging through some parts of it; one weakness that's become obvious in writing this entry is basically everything to do with remote repositories.

Written on 27 April 2017.
« Coming to a better understanding of what git rebase does
Hardware RAID and the problem of (not) observing disk IO »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Apr 27 01:10:48 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.