Where bootstrapping Go with a modern version of Go has gotten faster
Since Go 1.5, building Go from source requires an existing 'bootstrap' Go compiler. For at least a while, the fastest previous Go version to use for this was Go 1.4, the last version written in C and also the version that generally compiled Go source code the fastest. When I wrote up my process of building Go from source, I discovered that using Go 1.7.5 or Go 1.8.1 was actually now a bit faster than using Go 1.4. I mentioned this on Twitter and because the general slowdown in how fast Go compiles code has been one of Dave Cheney's favorite issues, I tagged him in my Tweet. Dave Cheney found that result surprising, so I decided to dig more into the details by adding some crude instrumentation to the process of building Go from source.
Building Go from source has four steps, and this is how I understand them:
##### Building Go bootstrap tool.
cmd/dist. It uses your bootstrap version of Go.
##### Building Go toolchain using <bootstrap go>.
This builds a bunch of '
bootstrap/*' stuff with
cmd/dist, again using your bootstrap Go. My understanding is that this is a minimal Go compiler, assembler, and linker that omits various things in order to guarantee that it can be compiled under Go 1.4.
##### Building go_bootstrap for host, linux/amd64.
I believe that this builds the
gotool itself and various associated bits and pieces using the
bootstrap/*compiler and so on built in step 2. In particular, this does not appear to rebuild the step 2 compiler with itself.
(There is code to do this in
cmd/dist, but it is deliberately disabled.)
##### Building packages and commands for linux/amd64.
This builds and rebuilds everything; the full Go compiler, toolchain,
goprogram and its sub-programs, and the entire standard library. I believe it uses the
goprogram from step 3 but the compiler, assembler, and linker from step 2.
If I'm understanding this correctly, this means that as late as step 4 you're still building Go code using a compiler compiled by your initial bootstrap compiler, such as Go 1.4. However, you're using the current Go compiler from stage 3 onwards, not the bootstrap compiler itself; the stage 2 code is the last thing compiled by your bootstrap compiler (and so the last place its compilation speed matters).
So now to timings. I tested building an almost-current version of
Go tip (it identifies itself as '+e5bb5e3') using three different
bootstrap Go versions: Go 1.4, Go 1.8.1, and Go tip (+482da51). I
timed things on a a quite powerful server with 96 GB of RAM, Xeon
E5-2680 CPUs, and 32 (hyperthreaded) cores. On this server,
using Go tip gives a
make.bash time of about 24 seconds total,
using Go 1.8.1 a time of about 28.5 seconds total, and Go 1.4 a
total time of almost 40 seconds. But a more interesting question
is where the time is going and which bootstrap compiler wins where:
- For stage 1, Go 1.4 is still the fastest and Go 1.8.1 the slowest
of the three. However this stage takes only a tiny amount of time.
- For stage 2, Go tip is fastest, followed by Go 1.4, then Go 1.8.1.
Go 1.4 uses by far the lowest 'user' time, so the other Go versions
are covering up speed issues by using more CPUs.
- For stage 3, Go tip is slightly faster than Go 1.8.1, and Go 1.4 is
- For stage 4, Go tip and Go 1.8.1 are tied and Go 1.4 is way behind, taking about twice as long (23 seconds versus 11.5 seconds).
My best guess at what is causing to Go 1.4 to be slower here is that it simply produces less optimized code than Go 1.8.1 and Go tip. As far as I can see, even the stage 4 compilation is still done using a Go compiler, assembler, and linker that were compiled with the bootstrap compiler, so if the bootstrap compiler produces slow code, they will run slower (despite all three bootstrap compilers compiling the same Go code). This is most visible in stage 4, because stage 4 (re)builds by far the most Go code. Go 1.4's compilation speed no longer helps here because we're not compiling with Go 1.4 itself; we're compiling with the 1.4-built but current (and thus generally slower) Go compiler toolchain.
(I think this explains why stage 3 and stage 4 are so close between Go 1.8.1 and Go tip; there probably is far less difference in code optimization between the two than between either and Go 1.4.)
Based on this, I would expect Go build times to be most clearly improved by a more recent bootstrap compiler on platforms with relatively bad code optimization in Go 1.4. My impression is that ARM may be one such platform.
If you're wondering why Go tip is so much faster than Go 1.8.1 on stage 2, the answer is probably the recently landed changes for Go issue #15756, 'cmd/compile: parallelize compilation'. As of this commit, concurrent backend compilation is enabled by default in the Go tip. Some quick testing suggests that this is responsible for almost all of the speed advantage of Go tip over Go 1.8.1.
(If you want to test this, note that stage 3 and stage 4 will normally use this too, at least if you're testing by building a Go git version after this commit landed. I don't know of an easy way to disable concurrent compilation only in the bootstrap compiler.)
Sidebar: Typical real and user times, in seconds
Here is a little table of typical wall clock ('real') and user mode
times, as reported by
time, for building with various different
bootstrap compilers. In each table cell, the real time is first,
then the user time (which is almost always larger).
|bootstrap:||Go 1.4||Go 1.8.1||Go tip|
|stage 1||0.7 / 0.6||1.2 / 1.3||0.8 / 1.4|
|stage 2||6.6 / 9.8||9.1 / 19.4||4.8 / 19.2|
|stage 3||7.9 / 15.2||6.8 / 16.1||6.4 / 15.4|
|stage 4||24.4 / 75.9||11.2 / 84.8||11.6 / 84.5|
(The stage 4 numbers between Go 1.8.1 and Go tip are too close to call from run to run. Possibly the stage 3 numbers are as well and I'm basically fooling myself to see a difference.)
Disclaimer: These numbers are not gathered with anything approaching
statistical rigor, because I don't have that much energy and
cmd/dist) don't make it particularly easy for an outsider to
get this sort of data.
For my own memory, if nothing else, all builds were done with
/tmp, which is a RAID-0 stripe of two 500 GB Seagate
Constellation ST9500620NS SATA drives. With 96 GB, I expect that
basically all static data was in kernel disk buffers in RAM all the
time, but some things may have been written to disk.