2017-05-28
My thoughts on git worktrees for me (and some notes on things I tried)
I recently discovered git worktrees and did some experimentation with using them for stuff that I do. The short summary of my experience so far is that while I can see the appeal for certain sorts of usage cases, I don't think git worktrees are a good fit for my situation and I'm probably to use completely independent repositories in the future.
My usage case was building my own copies of multiple versions of some project, starting with Go. Especially in the case of a language compiler and its standard library, it's reasonably useful to have the latest development version plus a stable version or two; for example, it gives me an easy way to test if something I'm working on will build on older released versions or if I've let a dependency on some recent bit of the standard library creep in. The initial process of creating a worktree for, say, Go 1.8 is reasonably straightforward:
cd /some/where/go git worktree add -b release-branch.go1.8 ../v1.8 origin/release-branch.go1.8
What proved tricky for me is updating this v1.8
tree when the Go
people update Go 1.8, as they do periodically.
My normal way of staying up to date on what changes are happening
in the main line of Go is to do 'git pull
' in my master repo
directory, note the lines that get printed out about fetched updates,
eg:
remote: Finding sources: 100% (64/64) remote: Total 64 (delta 23), reused 64 (delta 23) Unpacking objects: 100% (64/64), done. From https://go.googlesource.com/go ffab6ab877..d64c49098c master -> origin/master
And then I use 'git log ffab6ab877..d64c49098c
' to see what
changed. The problem with worktrees is that this information is
printed by 'git fetch
', and normally 'git fetch
' updates all
branches, both the mainline and, say, a release branch you're
following. So I actively don't want to run 'git pull
' or 'git
fetch
' in the worktree directory, because otherwise I will have
to remember to stop and look at the mainline updates it's just
fetched and reported to me.
What I wound up doing was running 'git pull
' in my main go
tree
and if there was an update to origin/release-branch.go1.8 reported,
I'd go to my 'v1.8
' directory and do 'git merge --ff-only
'. This
mostly worked (it blew up on me once for reasons I don't understand),
but it means that dealing with a worktree is different than dealing
with a normal Git repo directory (including an independently cloned
repo). Since 'git pull
' and other Git commands work 'normally' in a
worktree, I have to explicitly remember that I created something as a
worktree (or check to see if .git
is a directory to know, since 'git
status
' doesn't helpfully tell you one way or the other).
(In my current moderate level of Git knowledge and experience, I'm
going to avoid writing about the good usage cases I think I see for
worktrees. Anyway, one of them is documented in the git-worktree
manpage; I note that their
scenario uses a worktree for a one-shot branch that's never updated
from upstream.)
As mentioned, if I want to see if a particular Git repo is a worktree
or not I need to do 'ls -ld .git
'. If it's a file, I have a
worktree. If I have a directory, with how I currently use Git,
it's a full repo. 'git worktree list
' will list the main repo
and worktrees, but it doesn't annotate things with a 'you are here'
marker. Obviously if I used worktrees enough I could write a status
command to tell me, but then if I was doing that I could probably
write a bunch of commands to do what I want in general.
Sidebar: Excessively clever Git configuration hacking (maybe)
Bearing in mind that I don't understand Git as much as I think I
may, as far as I can see what branches 'git fetch
' fetches are
determined from the configuration for the remote for a branch, not
from the branch's configuration. There appear to be two options for
fiddling things here.
The 'obvious' option is to create a second remote (call it, say,
'v1.8-origin') with the same url
as origin
but a fetch
setting that only fetches the particular branch:
fetch = refs/heads/release-branch.go1.8:refs/remotes/origin/release-branch.go1.8
Then I'd switch the remote
for the release-branch.go1.8 branch to
this new remote.
Git-fetch also has a feature where you can have a per-branch
configuration in $GIT_DIR/branches/<branch>
; this can be used
to name the upstream 'head' (branch) that will be fetched into the
local branch. It appears that creating such a file should do the
trick, but I can't find people writing about this on the Internet
(just many copies of the git-fetch
manpage), so I'm wary of
assuming that I understand what's going to happen here. Plus, it's
apparently a deprecated legacy approach.
(If I understand all of this correctly, either approach would
preserve 'git pull
' in the main repo (which is on the master
branch) always fetching all branches from upstream.)
Specifications are ultimately defined by their implementations
In theory, the literal text of a specification is the final authority on defining what the specification means and requires. In practice, it generally doesn't work out this way; once a specification gets adopted, it ultimately becomes defined by its implementations. Regardless of what the actual text says, if everyone, or most people, or just dominant implementations do something or have some (mis-)interpretation of the specification, those things become the specification in practice. If your implementation doesn't conform to the wrong things that other implementations do, you can expect to have problems interoperating with those other implementations, and they almost always have more practical power than you do. You can appeal to the specification all you want, but it's not going to get you anywhere. People actually using the implementations generally care most that they interoperate, and they don't really care about why they do or don't. A new implementation that refuses to interoperate may or may not be 'correct' by the specification (many people are not well placed to know for sure), but it certainly isn't very useful to most people and it's not likely to get many users in the absence of other factors.
(Of course there can always be other factors. It's sometimes possible to give people no choice about using a particular (new) implementation or very strongly tilt them towards it, and if you do this with a big enough pool of people, your new implementation can rapidly become a dominant one. The browser wars in the late 90s are one example of this effect in action, as are browser engines on mobile platforms today.)
One corollary of this is that it's quite important to write a clear and good specification. Such a specification maximizes the chances that all implementations will do the same thing and that what they do will match what you wrote. Conversely, the more confusing and awkward the specification, the more initial chaos there will be in implementations and the more random and divergent from your intentions the eventual actual in-practice 'standard' is likely to be.
(If your specification is successful, enough of the various people involved will wind up implementing some common behavior so they can interoperate. This behavior does not necessarily have much relationship to what you intended; instead it's likely to be based on some combination of common misunderstandings, early implementations that set the stage for everyone else to copy, and what people settled on as the most useful way to behave.)
(I've sort of written about this before in the context of programming language specifications.)