2017-05-31
Why one git fetch
default configuration bit is probably okay
I've recently been reading the git fetch
manpage reasonably carefully as part
of trying to understand what I'm doing with limited fetches. If you do this, you'll run across an
interesting piece of information about the <refspec>
argument,
including in its form as the fetch =
setting for remotes.
The basic syntax is '<src>:<dst>
', and the standard version
that is created by any git clone
gives you:
fetch = +refs/heads/*:refs/remotes/origin/*
You might wonder about that +
at the start, and I certainly did.
Well, it's special magic. To quote the documentation:
The remote ref that matches <src> is fetched, and if <dst> is not empty string, the local ref that matches it is fast-forwarded using <src>. If the optional plus
+
is used, the local ref is updated even if it does not result in a fast-forward update.
(Emphasis mine.)
When I read this my eyebrows went up, because it sounded dangerous.
There's certainly lots of complicated processes around 'git pull
'
if it detects that it can't fast-forward what it's just fetched,
so allowing non-fast-forward fetches (and by default) certainly
sounded like maybe it was something I wanted to turn off. So I tried
to think carefully about what's going on here, and as a result I now
believe that this configuration is mostly harmless and probably what
you want.
The big thing is that this is not about what happens with your local
branch, eg master
or rel-1.8
. This is about your repo's copy
of the remote branch, for example origin/master
or origin/rel-1.8
.
And it is not even about the branch, because branches are really
'refs', symbolic references to specific commits. git fetch
maintains
refs (here under refs/remotes/origin
) for every branch that you're
copying from the remote, and one of the things that it does when
you fetch updates is update these refs. This lets the rest of Git
use them and do things like merge or fast-forward remote updates
into your local remote-tracking branch.
So git fetch
's documentation is talking about what it does to
these remote-branch refs if the branch on the remote has been rebased
or rewound so that it is no longer a more recent version of what
you have from your last update of the remote. With the +
included
in the <refspec>
, git fetch
always updates your repo's ref for
the remote branch to match whatever the remote has; basically it
overwrites whatever ref you used to have with the new ref from the
remote. After a fetch, your origin/master
or origin/rel-1.8
will always be the same as the remote's, even if the remote rebased,
rewound, or did other weird things. You can then go on to fix up
your local branch in a variety of ways.
(To be technical your origin/master
will be the same as origin
's
master
, but you get the idea here.)
This makes the +
a reasonable default, because it means that 'git
fetch
' will reliably mirror even a remote that is rebasing and
otherwise having its history rewritten and its branches changed
around. Without the +
, 'git fetch
' might transfer the new and
revised commits and trees from your remote but it wouldn't give you
any convenient reference for them for you to look at them, integrate
them, or just reset your local remote-tracking branch to their new
state.
(Without the '+
', 'git fetch
' won't update your repo's remote-branch
refs. I don't know if it writes the new ref information anywhere,
perhaps to .git/FETCH_HEAD
, or if it just throws it away,
possibly after printing out commit hashes.)
Sidebar: When I can imagine not using a '+
'
The one thing that using a '+
' does is that it sort of allows a
remote to effectively delete past history out of your local repo,
something that's not normally possible in a DVCS and potentially
not desirable. It doesn't do this
directly, but it starts an indirect process of it and it certainly
makes the old history somewhat annoying to get at.
Git doesn't let a remote directly delete commits, trees, and objects. But unreferenced items in your repo are slowly garbage-collected after a while and when you update your remote-branch refs after a non-ff fetch, the old commits that the pre-fetch refs pointed to start becoming more and more unreachable. I believe they live on in the reflog for a while, but you have to know that they're missing and to look.
If you want to be absolutely sure that you notice any funny business
going on in an upstream remote that is not supposed to modify its
public history this way, not using '+
' will probably help. I'm not
sure if it's the easiest way to do this, though, because I don't know
what 'git fetch
' does when it detects a non-ff fetch like this.
(Hopefully git fetch
complains loudly instead of failing silently.)
2017-05-29
Configuring Git worktrees to limit what's fetched on pulls
Yesterday I wrote about my practical problem with git worktrees, which is to limit what is fetched from the
remote when I do 'git pull
' in one (as opposed to the main repo).
I also included a sidebar with a theory on how to do this with some
Git configuration madness. In a spirit of crazed experimentation
I've now put this theory into practice and it appears to actually
work. Unfortunately the way I know how to do this requires some
hand editing of your .git/config
, rather than using commands like
'git remote
' to do this for you. However, I don't fully understand
what I'm doing here (and that's one reason I'm putting in lots of
notes to myself).
Here's my process:
- Create a new worktree as normal, based from the
origin
branch you want:git worktree add -b release-branch.go1.8 ../v1.8 origin/release-branch.go1.8
Because we used
-b
, this will also create a local remote-tracking branch,release-branch.go1.8
, that tracksorigin
'srelease-branch.go1.8
branch.If you already have a
release-branch.go1.8
branch (perhaps you've checked it out in your main repo at some point or previously created a worktree for it), this is just:git worktree add ../v1.8 release-branch.go1.8
- Create a new remote for your upstream repo to fetch just this upstream
branch:
git remote add -t release-branch.go1.8 origin-v1.8 https://go.googlesource.com/go
Because we set it up to track only a specific remote branch, '
git fetch
' for this remote will only fetch updates for the remote'srelease-branch.go1.8
branch, even though it has the same URL as our regularorigin
remote (which will normally fetch all branches). - Edit
.git/config
to change thefetch =
line fororigin-v1.8
to fetch the branch intorefs/remotes/origin/release-branch.go1.8
, which is the fetch destination for yourorigin
remote. That is:fetch = +refs/heads/release-branch.go1.8:refs/remotes/origin/release-branch.go1.8
By fetching into
refs/remotes/origin
like this, my understanding is that we avoid doing duplicate fetches. Whether we do 'git fetch
' in our worktree or in the maste repo, we'll be updating the same remote branch reference and so we'll only fetch updates for this (remote) branch once. I believe that if you don't do this, 'git pull
' or 'git fetch
' in the worktree will always report the new updates; you'll never 'lose' an update for the branch by doing a 'git pull
' in the master. However I think you may wind up doing extra transfers.(This can be done with
git config
but I'd rather edit.git/config
by hand.) - Edit
.git/config
again to change the 'remote =
' line for yourrelease-branch.go1.8
branch to beorigin-v1.8
instead oforigin
.By forcing the remote for the branch, we activate
git fetch
's restriction on what remote branches will be fetched when we do a 'git pull
' or 'git fetch
' in a tree with that branch checked out (here, our worktree, but it could be the master repo).If you prefer, you can set this with '
git config
' instead of by hand editing:git config branch.release-branch.go1.8.remote origin-v1.8
We can see that this works by comparing 'git fetch -v --dry-run
'
in the worktree and in the master repo. In the worktree, it will
report just an attempt to update origin/release-branch.go1.8
.
In the master repo, it will (normally) report an attempt to update
everything.
Because everything is attached to our branch configuration for the
(local) release-branch.go1.8
branch, not the worktree, this will
survive removing and then re-recreating the worktree. This may be
a feature, or it may be a drawback, since it means that if you
delete the worktree and check out release-branch.go1.8
in the
master repo, 'git pull
' will start only updating it (and not
updating master
and other branches as well). We can change back
to the normal state of things by updating the remote
for the
branch back to the normal origin
remote:
git config branch.release-branch.go1.8.remote origin
(In general you can flip the state of the branch back and forth as you want. I don't think Git gets confused, although you may.)
2017-05-28
My thoughts on git worktrees for me (and some notes on things I tried)
I recently discovered git worktrees and did some experimentation with using them for stuff that I do. The short summary of my experience so far is that while I can see the appeal for certain sorts of usage cases, I don't think git worktrees are a good fit for my situation and I'm probably to use completely independent repositories in the future.
My usage case was building my own copies of multiple versions of some project, starting with Go. Especially in the case of a language compiler and its standard library, it's reasonably useful to have the latest development version plus a stable version or two; for example, it gives me an easy way to test if something I'm working on will build on older released versions or if I've let a dependency on some recent bit of the standard library creep in. The initial process of creating a worktree for, say, Go 1.8 is reasonably straightforward:
cd /some/where/go git worktree add -b release-branch.go1.8 ../v1.8 origin/release-branch.go1.8
What proved tricky for me is updating this v1.8
tree when the Go
people update Go 1.8, as they do periodically.
My normal way of staying up to date on what changes are happening
in the main line of Go is to do 'git pull
' in my master repo
directory, note the lines that get printed out about fetched updates,
eg:
remote: Finding sources: 100% (64/64) remote: Total 64 (delta 23), reused 64 (delta 23) Unpacking objects: 100% (64/64), done. From https://go.googlesource.com/go ffab6ab877..d64c49098c master -> origin/master
And then I use 'git log ffab6ab877..d64c49098c
' to see what
changed. The problem with worktrees is that this information is
printed by 'git fetch
', and normally 'git fetch
' updates all
branches, both the mainline and, say, a release branch you're
following. So I actively don't want to run 'git pull
' or 'git
fetch
' in the worktree directory, because otherwise I will have
to remember to stop and look at the mainline updates it's just
fetched and reported to me.
What I wound up doing was running 'git pull
' in my main go
tree
and if there was an update to origin/release-branch.go1.8 reported,
I'd go to my 'v1.8
' directory and do 'git merge --ff-only
'. This
mostly worked (it blew up on me once for reasons I don't understand),
but it means that dealing with a worktree is different than dealing
with a normal Git repo directory (including an independently cloned
repo). Since 'git pull
' and other Git commands work 'normally' in a
worktree, I have to explicitly remember that I created something as a
worktree (or check to see if .git
is a directory to know, since 'git
status
' doesn't helpfully tell you one way or the other).
(In my current moderate level of Git knowledge and experience, I'm
going to avoid writing about the good usage cases I think I see for
worktrees. Anyway, one of them is documented in the git-worktree
manpage; I note that their
scenario uses a worktree for a one-shot branch that's never updated
from upstream.)
As mentioned, if I want to see if a particular Git repo is a worktree
or not I need to do 'ls -ld .git
'. If it's a file, I have a
worktree. If I have a directory, with how I currently use Git,
it's a full repo. 'git worktree list
' will list the main repo
and worktrees, but it doesn't annotate things with a 'you are here'
marker. Obviously if I used worktrees enough I could write a status
command to tell me, but then if I was doing that I could probably
write a bunch of commands to do what I want in general.
Sidebar: Excessively clever Git configuration hacking (maybe)
Bearing in mind that I don't understand Git as much as I think I
may, as far as I can see what branches 'git fetch
' fetches are
determined from the configuration for the remote for a branch, not
from the branch's configuration. There appear to be two options for
fiddling things here.
The 'obvious' option is to create a second remote (call it, say,
'v1.8-origin') with the same url
as origin
but a fetch
setting that only fetches the particular branch:
fetch = refs/heads/release-branch.go1.8:refs/remotes/origin/release-branch.go1.8
Then I'd switch the remote
for the release-branch.go1.8 branch to
this new remote.
Git-fetch also has a feature where you can have a per-branch
configuration in $GIT_DIR/branches/<branch>
; this can be used
to name the upstream 'head' (branch) that will be fetched into the
local branch. It appears that creating such a file should do the
trick, but I can't find people writing about this on the Internet
(just many copies of the git-fetch
manpage), so I'm wary of
assuming that I understand what's going to happen here. Plus, it's
apparently a deprecated legacy approach.
(If I understand all of this correctly, either approach would
preserve 'git pull
' in the main repo (which is on the master
branch) always fetching all branches from upstream.)
2017-05-12
Where bootstrapping Go with a modern version of Go has gotten faster
Since Go 1.5, building Go from source requires an existing 'bootstrap' Go compiler. For at least a while, the fastest previous Go version to use for this was Go 1.4, the last version written in C and also the version that generally compiled Go source code the fastest. When I wrote up my process of building Go from source, I discovered that using Go 1.7.5 or Go 1.8.1 was actually now a bit faster than using Go 1.4. I mentioned this on Twitter and because the general slowdown in how fast Go compiles code has been one of Dave Cheney's favorite issues, I tagged him in my Tweet. Dave Cheney found that result surprising, so I decided to dig more into the details by adding some crude instrumentation to the process of building Go from source.
Building Go from source has four steps, and this is how I understand them:
##### Building Go bootstrap tool.
This builds
cmd/dist
. It uses your bootstrap version of Go.##### Building Go toolchain using <bootstrap go>.
This builds a bunch of '
bootstrap/*
' stuff withcmd/dist
, again using your bootstrap Go. My understanding is that this is a minimal Go compiler, assembler, and linker that omits various things in order to guarantee that it can be compiled under Go 1.4.##### Building go_bootstrap for host, linux/amd64.
I believe that this builds the
go
tool itself and various associated bits and pieces using thebootstrap/*
compiler and so on built in step 2. In particular, this does not appear to rebuild the step 2 compiler with itself.(There is code to do this in
cmd/dist
, but it is deliberately disabled.)##### Building packages and commands for linux/amd64.
This builds and rebuilds everything; the full Go compiler, toolchain,
go
program and its sub-programs, and the entire standard library. I believe it uses thego
program from step 3 but the compiler, assembler, and linker from step 2.
If I'm understanding this correctly, this means that as late as step 4 you're still building Go code using a compiler compiled by your initial bootstrap compiler, such as Go 1.4. However, you're using the current Go compiler from stage 3 onwards, not the bootstrap compiler itself; the stage 2 code is the last thing compiled by your bootstrap compiler (and so the last place its compilation speed matters).
So now to timings. I tested building an almost-current version of
Go tip (it identifies itself as '+e5bb5e3') using three different
bootstrap Go versions: Go 1.4, Go 1.8.1, and Go tip (+482da51). I
timed things on a a quite powerful server with 96 GB of RAM, Xeon
E5-2680 CPUs, and 32 (hyperthreaded) cores. On this server,
using Go tip gives a make.bash
time of about 24 seconds total,
using Go 1.8.1 a time of about 28.5 seconds total, and Go 1.4 a
total time of almost 40 seconds. But a more interesting question
is where the time is going and which bootstrap compiler wins where:
- For stage 1, Go 1.4 is still the fastest and Go 1.8.1 the slowest
of the three. However this stage takes only a tiny amount of time.
- For stage 2, Go tip is fastest, followed by Go 1.4, then Go 1.8.1.
Go 1.4 uses by far the lowest 'user' time, so the other Go versions
are covering up speed issues by using more CPUs.
- For stage 3, Go tip is slightly faster than Go 1.8.1, and Go 1.4 is
clearly third.
- For stage 4, Go tip and Go 1.8.1 are tied and Go 1.4 is way behind, taking about twice as long (23 seconds versus 11.5 seconds).
My best guess at what is causing to Go 1.4 to be slower here is that it simply produces less optimized code than Go 1.8.1 and Go tip. As far as I can see, even the stage 4 compilation is still done using a Go compiler, assembler, and linker that were compiled with the bootstrap compiler, so if the bootstrap compiler produces slow code, they will run slower (despite all three bootstrap compilers compiling the same Go code). This is most visible in stage 4, because stage 4 (re)builds by far the most Go code. Go 1.4's compilation speed no longer helps here because we're not compiling with Go 1.4 itself; we're compiling with the 1.4-built but current (and thus generally slower) Go compiler toolchain.
(I think this explains why stage 3 and stage 4 are so close between Go 1.8.1 and Go tip; there probably is far less difference in code optimization between the two than between either and Go 1.4.)
Based on this, I would expect Go build times to be most clearly improved by a more recent bootstrap compiler on platforms with relatively bad code optimization in Go 1.4. My impression is that ARM may be one such platform.
If you're wondering why Go tip is so much faster than Go 1.8.1 on stage 2, the answer is probably the recently landed changes for Go issue #15756, 'cmd/compile: parallelize compilation'. As of this commit, concurrent backend compilation is enabled by default in the Go tip. Some quick testing suggests that this is responsible for almost all of the speed advantage of Go tip over Go 1.8.1.
(If you want to test this, note that stage 3 and stage 4 will normally use this too, at least if you're testing by building a Go git version after this commit landed. I don't know of an easy way to disable concurrent compilation only in the bootstrap compiler.)
Sidebar: Typical real and user times, in seconds
Here is a little table of typical wall clock ('real') and user mode
times, as reported by time
, for building with various different
bootstrap compilers. In each table cell, the real time is first,
then the user time (which is almost always larger).
bootstrap: | Go 1.4 | Go 1.8.1 | Go tip |
stage 1 | 0.7 / 0.6 | 1.2 / 1.3 | 0.8 / 1.4 |
stage 2 | 6.6 / 9.8 | 9.1 / 19.4 | 4.8 / 19.2 |
stage 3 | 7.9 / 15.2 | 6.8 / 16.1 | 6.4 / 15.4 |
stage 4 | 24.4 / 75.9 | 11.2 / 84.8 | 11.6 / 84.5 |
(The stage 4 numbers between Go 1.8.1 and Go tip are too close to call from run to run. Possibly the stage 3 numbers are as well and I'm basically fooling myself to see a difference.)
Disclaimer: These numbers are not gathered with anything approaching
statistical rigor, because I don't have that much energy and make.bash
(and cmd/dist
) don't make it particularly easy for an outsider to
get this sort of data.
For my own memory, if nothing else, all builds were done with
everything in /tmp
, which is a RAID-0 stripe of two 500 GB Seagate
Constellation ST9500620NS SATA drives. With 96 GB, I expect that
basically all static data was in kernel disk buffers in RAM all the
time, but some things may have been written to disk.
2017-05-10
Building the Go compiler from source from scratch (on Unix)
Unlike some languages which
are a real tedious pain to build from source, Go
is both easy and interesting to build from source, even (and
especially) for the latest development version. Building from source
can be especially convenient if you want your own personal copy of
a current version of Go (or the very latest version) on a system
where you don't have permissions required to install system packages
or write to /usr/local
. I've seen various recipes for building
Go this way, but here is the one I now recommend that you use, with
some commentary on why I'm doing it this way.
First off, to build Go you need a working C compiler environment
and a reasonably current version of git
. Arranging for these is
beyond the scope of these instructions; I'm just going to assume
that you can build programs in general. Building current versions
of Go also requires a working Go compiler, so the from scratch
process of building Go from source needs another working Go compiler.
The easiest and currently best source of this second Go compiler
is a prebuilt pacakge from the Go people.
My process goes like this:
- Make a bootstrap area that you'll use for the bootstrap Go compiler,
and fetch the latest prebuilt Go 1.8 package from the official Go
downloads area:
mkdir bootstrap cd bootstrap wget https://.../<whatever>.tar.gz tar -xf <whatever>.tar.gz
You specifically want Go 1.8 (1.8.1 as I write this) because Go compile times took a nose dive from Go 1.5 onwards (the first version of the compiler that was written in Go instead of C) and only recently recovered. It used to be clearly slower to bootstrap Go with versions of Go from 1.5 onwards, but it's now actually slightly faster to do so with Go 1.8.1 instead of with Go 1.4, at least on 64-bit Linux x86.
(I wound up testing this as part of writing this entry and surprised myself. I used to use Go 1.4 as the bootstrap compiler; I'm now switching to Go 1.8. A quick test suggest that Go 1.7 is also slightly faster than Go 1.4 for this, but Go 1.8 is faster than Go 1.7 so you might as well use it.)
If your system already has a system version of Go 1.8, you can use that. If the latest version of Go is more recent than Go 1.8 (on your system or released by the Go people or both), it might be better for this. Go 1.9 is probably going to compile Go programs faster than Go 1.8, but predicting the future beyond it is hard.
- Get a Git clone of the current master repository:
cd /some/where git clone https://go.googlesource.com/go go
- Create a little script to build your master version of Go using
the version of Go in the bootstrap area; this script lives in
go/src
. I call my scriptmake-all.bash
, and a simple version looks like this:#!/bin/bash GOROOT_BOOTSTRAP=/some/where/bootstrap/go export GOROOT_BOOTSTRAP ./all.bash
You can do this by hand but it gets to be a pain to remember the correct setting for
$GOROOT_BOOTSTRAP
and scripts capture knowledge.If you're using a system version of Go instead of your own bootstrap version, the
$GOROOT_BOOTSTRAP
setting you want is:GOROOT_BOOTSTRAP=$(/usr/bin/go env GOROOT)
Or perhaps
/usr/local/bin/go
, or even/usr/local/go/bin/go
. - Build the latest version of Go with this script:
cd go/src ./make-all.bash
You can now add
/some/where/go/bin
to your path, or symlink the programs there into$HOME/bin
if you prefer.(As with most compilers, Go does a two-stage build; first it builds itself with your bootstrap Go, and then it rebuilds itself with itself.)
When you want to (re)build the latest version of Go, you simply
'git pull
' to update the master tree and then repeat step four.
Future versions of Go will make all of this somewhat easier because
they'll permit you to download prebuilt binaries but put them
anywhere you want without hassles. Today, it requires somewhat
awkward gyrations to download one of the distribution packages but not put it in /usr/local/go
, which
creates more than one reason to build your own version of Go from
source.
Sidebar: Building specific versions of Go
Since the development tree sometimes breaks or has things in it
that you don't actually want to use, you may also want to keep
around your own copy of, say, the latest officially released Go
version, which is Go 1.8.x as I write this. You can do this as a
Git worktree derived from your master go
repository:
cd /some/where/go git worktree add -b release-branch.go1.8 ../v1.8 origin/release-branch.go1.8 cd ../v1.8/src cp ../go/src/make-all.bash . ./make-all.bash
('git branch -r
' in your go
repo will be useful here. I believe
this tree can be updated when the Go people release new updates for
Go 1.8, although I'm not completely sure of the best Git way to do
it.)
This is different from the binary release that you downloaded to
/some/where/bootstrap/go
, because it doesn't require any weird
steps to use. You can just add /some/where/v1.8/bin
at the start
of your $PATH
and then everything just works, unlike the bootstrap
copy, which requires you to set $GOROOT
to use it.
By the way, yes, once you build your own version of Go 1.8, you can use it as the bootstrap compiler for the latest development version of Go.
(Even more recursive setups are possible. My version of Go 1.8 that I'm now using as my bootstrap Go compiler was actually bootstrapped with the latest Go development version, because why not.)