Wandering Thoughts archives

2017-05-31

Why one git fetch default configuration bit is probably okay

I've recently been reading the git fetch manpage reasonably carefully as part of trying to understand what I'm doing with limited fetches. If you do this, you'll run across an interesting piece of information about the <refspec> argument, including in its form as the fetch = setting for remotes. The basic syntax is '<src>:<dst>', and the standard version that is created by any git clone gives you:

fetch = +refs/heads/*:refs/remotes/origin/*

You might wonder about that + at the start, and I certainly did. Well, it's special magic. To quote the documentation:

The remote ref that matches <src> is fetched, and if <dst> is not empty string, the local ref that matches it is fast-forwarded using <src>. If the optional plus + is used, the local ref is updated even if it does not result in a fast-forward update.

(Emphasis mine.)

When I read this my eyebrows went up, because it sounded dangerous. There's certainly lots of complicated processes around 'git pull' if it detects that it can't fast-forward what it's just fetched, so allowing non-fast-forward fetches (and by default) certainly sounded like maybe it was something I wanted to turn off. So I tried to think carefully about what's going on here, and as a result I now believe that this configuration is mostly harmless and probably what you want.

The big thing is that this is not about what happens with your local branch, eg master or rel-1.8. This is about your repo's copy of the remote branch, for example origin/master or origin/rel-1.8. And it is not even about the branch, because branches are really 'refs', symbolic references to specific commits. git fetch maintains refs (here under refs/remotes/origin) for every branch that you're copying from the remote, and one of the things that it does when you fetch updates is update these refs. This lets the rest of Git use them and do things like merge or fast-forward remote updates into your local remote-tracking branch.

So git fetch's documentation is talking about what it does to these remote-branch refs if the branch on the remote has been rebased or rewound so that it is no longer a more recent version of what you have from your last update of the remote. With the + included in the <refspec>, git fetch always updates your repo's ref for the remote branch to match whatever the remote has; basically it overwrites whatever ref you used to have with the new ref from the remote. After a fetch, your origin/master or origin/rel-1.8 will always be the same as the remote's, even if the remote rebased, rewound, or did other weird things. You can then go on to fix up your local branch in a variety of ways.

(To be technical your origin/master will be the same as origin's master, but you get the idea here.)

This makes the + a reasonable default, because it means that 'git fetch' will reliably mirror even a remote that is rebasing and otherwise having its history rewritten and its branches changed around. Without the +, 'git fetch' might transfer the new and revised commits and trees from your remote but it wouldn't give you any convenient reference for them for you to look at them, integrate them, or just reset your local remote-tracking branch to their new state.

(Without the '+', 'git fetch' won't update your repo's remote-branch refs. I don't know if it writes the new ref information anywhere, perhaps to .git/FETCH_HEAD, or if it just throws it away, possibly after printing out commit hashes.)

Sidebar: When I can imagine not using a '+'

The one thing that using a '+' does is that it sort of allows a remote to effectively delete past history out of your local repo, something that's not normally possible in a DVCS and potentially not desirable. It doesn't do this directly, but it starts an indirect process of it and it certainly makes the old history somewhat annoying to get at.

Git doesn't let a remote directly delete commits, trees, and objects. But unreferenced items in your repo are slowly garbage-collected after a while and when you update your remote-branch refs after a non-ff fetch, the old commits that the pre-fetch refs pointed to start becoming more and more unreachable. I believe they live on in the reflog for a while, but you have to know that they're missing and to look.

If you want to be absolutely sure that you notice any funny business going on in an upstream remote that is not supposed to modify its public history this way, not using '+' will probably help. I'm not sure if it's the easiest way to do this, though, because I don't know what 'git fetch' does when it detects a non-ff fetch like this.

(Hopefully git fetch complains loudly instead of failing silently.)

GitFetchMagicPlus written at 00:44:13; Add Comment

2017-05-29

Configuring Git worktrees to limit what's fetched on pulls

Yesterday I wrote about my practical problem with git worktrees, which is to limit what is fetched from the remote when I do 'git pull' in one (as opposed to the main repo). I also included a sidebar with a theory on how to do this with some Git configuration madness. In a spirit of crazed experimentation I've now put this theory into practice and it appears to actually work. Unfortunately the way I know how to do this requires some hand editing of your .git/config, rather than using commands like 'git remote' to do this for you. However, I don't fully understand what I'm doing here (and that's one reason I'm putting in lots of notes to myself).

Here's my process:

  1. Create a new worktree as normal, based from the origin branch you want:

    git worktree add -b release-branch.go1.8 ../v1.8 origin/release-branch.go1.8
    

    Because we used -b, this will also create a local remote-tracking branch, release-branch.go1.8, that tracks origin's release-branch.go1.8 branch.

    If you already have a release-branch.go1.8 branch (perhaps you've checked it out in your main repo at some point or previously created a worktree for it), this is just:

    git worktree add ../v1.8 release-branch.go1.8
    

  2. Create a new remote for your upstream repo to fetch just this upstream branch:

    git remote add -t release-branch.go1.8 origin-v1.8 https://go.googlesource.com/go
    

    Because we set it up to track only a specific remote branch, 'git fetch' for this remote will only fetch updates for the remote's release-branch.go1.8 branch, even though it has the same URL as our regular origin remote (which will normally fetch all branches).

  3. Edit .git/config to change the fetch = line for origin-v1.8 to fetch the branch into refs/remotes/origin/release-branch.go1.8, which is the fetch destination for your origin remote. That is:

    fetch = +refs/heads/release-branch.go1.8:refs/remotes/origin/release-branch.go1.8
    

    By fetching into refs/remotes/origin like this, my understanding is that we avoid doing duplicate fetches. Whether we do 'git fetch' in our worktree or in the maste repo, we'll be updating the same remote branch reference and so we'll only fetch updates for this (remote) branch once. I believe that if you don't do this, 'git pull' or 'git fetch' in the worktree will always report the new updates; you'll never 'lose' an update for the branch by doing a 'git pull' in the master. However I think you may wind up doing extra transfers.

    (This can be done with git config but I'd rather edit .git/config by hand.)

  4. Edit .git/config again to change the 'remote =' line for your release-branch.go1.8 branch to be origin-v1.8 instead of origin.

    By forcing the remote for the branch, we activate git fetch's restriction on what remote branches will be fetched when we do a 'git pull' or 'git fetch' in a tree with that branch checked out (here, our worktree, but it could be the master repo).

    If you prefer, you can set this with 'git config' instead of by hand editing:

    git config branch.release-branch.go1.8.remote origin-v1.8
    

We can see that this works by comparing 'git fetch -v --dry-run' in the worktree and in the master repo. In the worktree, it will report just an attempt to update origin/release-branch.go1.8. In the master repo, it will (normally) report an attempt to update everything.

Because everything is attached to our branch configuration for the (local) release-branch.go1.8 branch, not the worktree, this will survive removing and then re-recreating the worktree. This may be a feature, or it may be a drawback, since it means that if you delete the worktree and check out release-branch.go1.8 in the master repo, 'git pull' will start only updating it (and not updating master and other branches as well). We can change back to the normal state of things by updating the remote for the branch back to the normal origin remote:

git config branch.release-branch.go1.8.remote origin

(In general you can flip the state of the branch back and forth as you want. I don't think Git gets confused, although you may.)

GitWorktreeLimitedPulling written at 22:42:43; Add Comment

2017-05-28

My thoughts on git worktrees for me (and some notes on things I tried)

I recently discovered git worktrees and did some experimentation with using them for stuff that I do. The short summary of my experience so far is that while I can see the appeal for certain sorts of usage cases, I don't think git worktrees are a good fit for my situation and I'm probably to use completely independent repositories in the future.

My usage case was building my own copies of multiple versions of some project, starting with Go. Especially in the case of a language compiler and its standard library, it's reasonably useful to have the latest development version plus a stable version or two; for example, it gives me an easy way to test if something I'm working on will build on older released versions or if I've let a dependency on some recent bit of the standard library creep in. The initial process of creating a worktree for, say, Go 1.8 is reasonably straightforward:

cd /some/where/go
git worktree add -b release-branch.go1.8 ../v1.8 origin/release-branch.go1.8

What proved tricky for me is updating this v1.8 tree when the Go people update Go 1.8, as they do periodically. My normal way of staying up to date on what changes are happening in the main line of Go is to do 'git pull' in my master repo directory, note the lines that get printed out about fetched updates, eg:

remote: Finding sources: 100% (64/64)
remote: Total 64 (delta 23), reused 64 (delta 23)
Unpacking objects: 100% (64/64), done.
From https://go.googlesource.com/go
   ffab6ab877..d64c49098c  master     -> origin/master

And then I use 'git log ffab6ab877..d64c49098c' to see what changed. The problem with worktrees is that this information is printed by 'git fetch', and normally 'git fetch' updates all branches, both the mainline and, say, a release branch you're following. So I actively don't want to run 'git pull' or 'git fetch' in the worktree directory, because otherwise I will have to remember to stop and look at the mainline updates it's just fetched and reported to me.

What I wound up doing was running 'git pull' in my main go tree and if there was an update to origin/release-branch.go1.8 reported, I'd go to my 'v1.8' directory and do 'git merge --ff-only'. This mostly worked (it blew up on me once for reasons I don't understand), but it means that dealing with a worktree is different than dealing with a normal Git repo directory (including an independently cloned repo). Since 'git pull' and other Git commands work 'normally' in a worktree, I have to explicitly remember that I created something as a worktree (or check to see if .git is a directory to know, since 'git status' doesn't helpfully tell you one way or the other).

(In my current moderate level of Git knowledge and experience, I'm going to avoid writing about the good usage cases I think I see for worktrees. Anyway, one of them is documented in the git-worktree manpage; I note that their scenario uses a worktree for a one-shot branch that's never updated from upstream.)

As mentioned, if I want to see if a particular Git repo is a worktree or not I need to do 'ls -ld .git'. If it's a file, I have a worktree. If I have a directory, with how I currently use Git, it's a full repo. 'git worktree list' will list the main repo and worktrees, but it doesn't annotate things with a 'you are here' marker. Obviously if I used worktrees enough I could write a status command to tell me, but then if I was doing that I could probably write a bunch of commands to do what I want in general.

Sidebar: Excessively clever Git configuration hacking (maybe)

Bearing in mind that I don't understand Git as much as I think I may, as far as I can see what branches 'git fetch' fetches are determined from the configuration for the remote for a branch, not from the branch's configuration. There appear to be two options for fiddling things here.

The 'obvious' option is to create a second remote (call it, say, 'v1.8-origin') with the same url as origin but a fetch setting that only fetches the particular branch:

fetch = refs/heads/release-branch.go1.8:refs/remotes/origin/release-branch.go1.8

Then I'd switch the remote for the release-branch.go1.8 branch to this new remote.

Git-fetch also has a feature where you can have a per-branch configuration in $GIT_DIR/branches/<branch>; this can be used to name the upstream 'head' (branch) that will be fetched into the local branch. It appears that creating such a file should do the trick, but I can't find people writing about this on the Internet (just many copies of the git-fetch manpage), so I'm wary of assuming that I understand what's going to happen here. Plus, it's apparently a deprecated legacy approach.

(If I understand all of this correctly, either approach would preserve 'git pull' in the main repo (which is on the master branch) always fetching all branches from upstream.)

GitWorktreeThoughts written at 23:08:19; Add Comment

2017-05-12

Where bootstrapping Go with a modern version of Go has gotten faster

Since Go 1.5, building Go from source requires an existing 'bootstrap' Go compiler. For at least a while, the fastest previous Go version to use for this was Go 1.4, the last version written in C and also the version that generally compiled Go source code the fastest. When I wrote up my process of building Go from source, I discovered that using Go 1.7.5 or Go 1.8.1 was actually now a bit faster than using Go 1.4. I mentioned this on Twitter and because the general slowdown in how fast Go compiles code has been one of Dave Cheney's favorite issues, I tagged him in my Tweet. Dave Cheney found that result surprising, so I decided to dig more into the details by adding some crude instrumentation to the process of building Go from source.

Building Go from source has four steps, and this is how I understand them:

  1. ##### Building Go bootstrap tool.

    This builds cmd/dist. It uses your bootstrap version of Go.

  2. ##### Building Go toolchain using <bootstrap go>.

    This builds a bunch of 'bootstrap/*' stuff with cmd/dist, again using your bootstrap Go. My understanding is that this is a minimal Go compiler, assembler, and linker that omits various things in order to guarantee that it can be compiled under Go 1.4.

  3. ##### Building go_bootstrap for host, linux/amd64.

    I believe that this builds the go tool itself and various associated bits and pieces using the bootstrap/* compiler and so on built in step 2. In particular, this does not appear to rebuild the step 2 compiler with itself.

    (There is code to do this in cmd/dist, but it is deliberately disabled.)

  4. ##### Building packages and commands for linux/amd64.

    This builds and rebuilds everything; the full Go compiler, toolchain, go program and its sub-programs, and the entire standard library. I believe it uses the go program from step 3 but the compiler, assembler, and linker from step 2.

If I'm understanding this correctly, this means that as late as step 4 you're still building Go code using a compiler compiled by your initial bootstrap compiler, such as Go 1.4. However, you're using the current Go compiler from stage 3 onwards, not the bootstrap compiler itself; the stage 2 code is the last thing compiled by your bootstrap compiler (and so the last place its compilation speed matters).

So now to timings. I tested building an almost-current version of Go tip (it identifies itself as '+e5bb5e3') using three different bootstrap Go versions: Go 1.4, Go 1.8.1, and Go tip (+482da51). I timed things on a a quite powerful server with 96 GB of RAM, Xeon E5-2680 CPUs, and 32 (hyperthreaded) cores. On this server, using Go tip gives a make.bash time of about 24 seconds total, using Go 1.8.1 a time of about 28.5 seconds total, and Go 1.4 a total time of almost 40 seconds. But a more interesting question is where the time is going and which bootstrap compiler wins where:

  • For stage 1, Go 1.4 is still the fastest and Go 1.8.1 the slowest of the three. However this stage takes only a tiny amount of time.

  • For stage 2, Go tip is fastest, followed by Go 1.4, then Go 1.8.1. Go 1.4 uses by far the lowest 'user' time, so the other Go versions are covering up speed issues by using more CPUs.

  • For stage 3, Go tip is slightly faster than Go 1.8.1, and Go 1.4 is clearly third.

  • For stage 4, Go tip and Go 1.8.1 are tied and Go 1.4 is way behind, taking about twice as long (23 seconds versus 11.5 seconds).

My best guess at what is causing to Go 1.4 to be slower here is that it simply produces less optimized code than Go 1.8.1 and Go tip. As far as I can see, even the stage 4 compilation is still done using a Go compiler, assembler, and linker that were compiled with the bootstrap compiler, so if the bootstrap compiler produces slow code, they will run slower (despite all three bootstrap compilers compiling the same Go code). This is most visible in stage 4, because stage 4 (re)builds by far the most Go code. Go 1.4's compilation speed no longer helps here because we're not compiling with Go 1.4 itself; we're compiling with the 1.4-built but current (and thus generally slower) Go compiler toolchain.

(I think this explains why stage 3 and stage 4 are so close between Go 1.8.1 and Go tip; there probably is far less difference in code optimization between the two than between either and Go 1.4.)

Based on this, I would expect Go build times to be most clearly improved by a more recent bootstrap compiler on platforms with relatively bad code optimization in Go 1.4. My impression is that ARM may be one such platform.

If you're wondering why Go tip is so much faster than Go 1.8.1 on stage 2, the answer is probably the recently landed changes for Go issue #15756, 'cmd/compile: parallelize compilation'. As of this commit, concurrent backend compilation is enabled by default in the Go tip. Some quick testing suggests that this is responsible for almost all of the speed advantage of Go tip over Go 1.8.1.

(If you want to test this, note that stage 3 and stage 4 will normally use this too, at least if you're testing by building a Go git version after this commit landed. I don't know of an easy way to disable concurrent compilation only in the bootstrap compiler.)

Sidebar: Typical real and user times, in seconds

Here is a little table of typical wall clock ('real') and user mode times, as reported by time, for building with various different bootstrap compilers. In each table cell, the real time is first, then the user time (which is almost always larger).

bootstrap: Go 1.4 Go 1.8.1 Go tip
stage 1 0.7 / 0.6 1.2 / 1.3 0.8 / 1.4
stage 2 6.6 / 9.8 9.1 / 19.4 4.8 / 19.2
stage 3 7.9 / 15.2 6.8 / 16.1 6.4 / 15.4
stage 4 24.4 / 75.9 11.2 / 84.8 11.6 / 84.5

(The stage 4 numbers between Go 1.8.1 and Go tip are too close to call from run to run. Possibly the stage 3 numbers are as well and I'm basically fooling myself to see a difference.)

Disclaimer: These numbers are not gathered with anything approaching statistical rigor, because I don't have that much energy and make.bash (and cmd/dist) don't make it particularly easy for an outsider to get this sort of data.

For my own memory, if nothing else, all builds were done with everything in /tmp, which is a RAID-0 stripe of two 500 GB Seagate Constellation ST9500620NS SATA drives. With 96 GB, I expect that basically all static data was in kernel disk buffers in RAM all the time, but some things may have been written to disk.

GoBuildWhereTimeGoes written at 02:39:24; Add Comment

2017-05-10

Building the Go compiler from source from scratch (on Unix)

Unlike some languages which are a real tedious pain to build from source, Go is both easy and interesting to build from source, even (and especially) for the latest development version. Building from source can be especially convenient if you want your own personal copy of a current version of Go (or the very latest version) on a system where you don't have permissions required to install system packages or write to /usr/local. I've seen various recipes for building Go this way, but here is the one I now recommend that you use, with some commentary on why I'm doing it this way.

First off, to build Go you need a working C compiler environment and a reasonably current version of git. Arranging for these is beyond the scope of these instructions; I'm just going to assume that you can build programs in general. Building current versions of Go also requires a working Go compiler, so the from scratch process of building Go from source needs another working Go compiler. The easiest and currently best source of this second Go compiler is a prebuilt pacakge from the Go people.

My process goes like this:

  1. Make a bootstrap area that you'll use for the bootstrap Go compiler, and fetch the latest prebuilt Go 1.8 package from the official Go downloads area:

    mkdir bootstrap
    cd bootstrap
    wget https://.../<whatever>.tar.gz
    tar -xf <whatever>.tar.gz
    

    You specifically want Go 1.8 (1.8.1 as I write this) because Go compile times took a nose dive from Go 1.5 onwards (the first version of the compiler that was written in Go instead of C) and only recently recovered. It used to be clearly slower to bootstrap Go with versions of Go from 1.5 onwards, but it's now actually slightly faster to do so with Go 1.8.1 instead of with Go 1.4, at least on 64-bit Linux x86.

    (I wound up testing this as part of writing this entry and surprised myself. I used to use Go 1.4 as the bootstrap compiler; I'm now switching to Go 1.8. A quick test suggest that Go 1.7 is also slightly faster than Go 1.4 for this, but Go 1.8 is faster than Go 1.7 so you might as well use it.)

    If your system already has a system version of Go 1.8, you can use that. If the latest version of Go is more recent than Go 1.8 (on your system or released by the Go people or both), it might be better for this. Go 1.9 is probably going to compile Go programs faster than Go 1.8, but predicting the future beyond it is hard.

  2. Get a Git clone of the current master repository:

    cd /some/where
    git clone https://go.googlesource.com/go go
    

  3. Create a little script to build your master version of Go using the version of Go in the bootstrap area; this script lives in go/src. I call my script make-all.bash, and a simple version looks like this:

    #!/bin/bash
    GOROOT_BOOTSTRAP=/some/where/bootstrap/go
    export GOROOT_BOOTSTRAP
    ./all.bash
    

    You can do this by hand but it gets to be a pain to remember the correct setting for $GOROOT_BOOTSTRAP and scripts capture knowledge.

    If you're using a system version of Go instead of your own bootstrap version, the $GOROOT_BOOTSTRAP setting you want is:

    GOROOT_BOOTSTRAP=$(/usr/bin/go env GOROOT)
    

    Or perhaps /usr/local/bin/go, or even /usr/local/go/bin/go.

  4. Build the latest version of Go with this script:

    cd go/src
    ./make-all.bash
    

    You can now add /some/where/go/bin to your path, or symlink the programs there into $HOME/bin if you prefer.

    (As with most compilers, Go does a two-stage build; first it builds itself with your bootstrap Go, and then it rebuilds itself with itself.)

When you want to (re)build the latest version of Go, you simply 'git pull' to update the master tree and then repeat step four.

Future versions of Go will make all of this somewhat easier because they'll permit you to download prebuilt binaries but put them anywhere you want without hassles. Today, it requires somewhat awkward gyrations to download one of the distribution packages but not put it in /usr/local/go, which creates more than one reason to build your own version of Go from source.

Sidebar: Building specific versions of Go

Since the development tree sometimes breaks or has things in it that you don't actually want to use, you may also want to keep around your own copy of, say, the latest officially released Go version, which is Go 1.8.x as I write this. You can do this as a Git worktree derived from your master go repository:

cd /some/where/go
git worktree add -b release-branch.go1.8 ../v1.8 origin/release-branch.go1.8
cd ../v1.8/src
cp ../go/src/make-all.bash .
./make-all.bash

('git branch -r' in your go repo will be useful here. I believe this tree can be updated when the Go people release new updates for Go 1.8, although I'm not completely sure of the best Git way to do it.)

This is different from the binary release that you downloaded to /some/where/bootstrap/go, because it doesn't require any weird steps to use. You can just add /some/where/v1.8/bin at the start of your $PATH and then everything just works, unlike the bootstrap copy, which requires you to set $GOROOT to use it.

By the way, yes, once you build your own version of Go 1.8, you can use it as the bootstrap compiler for the latest development version of Go.

(Even more recursive setups are possible. My version of Go 1.8 that I'm now using as my bootstrap Go compiler was actually bootstrapped with the latest Go development version, because why not.)

GoBuildFromSource written at 03:17:33; Add Comment

By day for May 2017: 10 12 28 29 31; before May; after May.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.