Where bootstrapping Go with a modern version of Go has gotten faster

May 12, 2017

Since Go 1.5, building Go from source requires an existing 'bootstrap' Go compiler. For at least a while, the fastest previous Go version to use for this was Go 1.4, the last version written in C and also the version that generally compiled Go source code the fastest. When I wrote up my process of building Go from source, I discovered that using Go 1.7.5 or Go 1.8.1 was actually now a bit faster than using Go 1.4. I mentioned this on Twitter and because the general slowdown in how fast Go compiles code has been one of Dave Cheney's favorite issues, I tagged him in my Tweet. Dave Cheney found that result surprising, so I decided to dig more into the details by adding some crude instrumentation to the process of building Go from source.

Building Go from source has four steps, and this is how I understand them:

  1. ##### Building Go bootstrap tool.

    This builds cmd/dist. It uses your bootstrap version of Go.

  2. ##### Building Go toolchain using <bootstrap go>.

    This builds a bunch of 'bootstrap/*' stuff with cmd/dist, again using your bootstrap Go. My understanding is that this is a minimal Go compiler, assembler, and linker that omits various things in order to guarantee that it can be compiled under Go 1.4.

  3. ##### Building go_bootstrap for host, linux/amd64.

    I believe that this builds the go tool itself and various associated bits and pieces using the bootstrap/* compiler and so on built in step 2. In particular, this does not appear to rebuild the step 2 compiler with itself.

    (There is code to do this in cmd/dist, but it is deliberately disabled.)

  4. ##### Building packages and commands for linux/amd64.

    This builds and rebuilds everything; the full Go compiler, toolchain, go program and its sub-programs, and the entire standard library. I believe it uses the go program from step 3 but the compiler, assembler, and linker from step 2.

If I'm understanding this correctly, this means that as late as step 4 you're still building Go code using a compiler compiled by your initial bootstrap compiler, such as Go 1.4. However, you're using the current Go compiler from stage 3 onwards, not the bootstrap compiler itself; the stage 2 code is the last thing compiled by your bootstrap compiler (and so the last place its compilation speed matters).

So now to timings. I tested building an almost-current version of Go tip (it identifies itself as '+e5bb5e3') using three different bootstrap Go versions: Go 1.4, Go 1.8.1, and Go tip (+482da51). I timed things on a a quite powerful server with 96 GB of RAM, Xeon E5-2680 CPUs, and 32 (hyperthreaded) cores. On this server, using Go tip gives a make.bash time of about 24 seconds total, using Go 1.8.1 a time of about 28.5 seconds total, and Go 1.4 a total time of almost 40 seconds. But a more interesting question is where the time is going and which bootstrap compiler wins where:

  • For stage 1, Go 1.4 is still the fastest and Go 1.8.1 the slowest of the three. However this stage takes only a tiny amount of time.

  • For stage 2, Go tip is fastest, followed by Go 1.4, then Go 1.8.1. Go 1.4 uses by far the lowest 'user' time, so the other Go versions are covering up speed issues by using more CPUs.

  • For stage 3, Go tip is slightly faster than Go 1.8.1, and Go 1.4 is clearly third.

  • For stage 4, Go tip and Go 1.8.1 are tied and Go 1.4 is way behind, taking about twice as long (23 seconds versus 11.5 seconds).

My best guess at what is causing to Go 1.4 to be slower here is that it simply produces less optimized code than Go 1.8.1 and Go tip. As far as I can see, even the stage 4 compilation is still done using a Go compiler, assembler, and linker that were compiled with the bootstrap compiler, so if the bootstrap compiler produces slow code, they will run slower (despite all three bootstrap compilers compiling the same Go code). This is most visible in stage 4, because stage 4 (re)builds by far the most Go code. Go 1.4's compilation speed no longer helps here because we're not compiling with Go 1.4 itself; we're compiling with the 1.4-built but current (and thus generally slower) Go compiler toolchain.

(I think this explains why stage 3 and stage 4 are so close between Go 1.8.1 and Go tip; there probably is far less difference in code optimization between the two than between either and Go 1.4.)

Based on this, I would expect Go build times to be most clearly improved by a more recent bootstrap compiler on platforms with relatively bad code optimization in Go 1.4. My impression is that ARM may be one such platform.

If you're wondering why Go tip is so much faster than Go 1.8.1 on stage 2, the answer is probably the recently landed changes for Go issue #15756, 'cmd/compile: parallelize compilation'. As of this commit, concurrent backend compilation is enabled by default in the Go tip. Some quick testing suggests that this is responsible for almost all of the speed advantage of Go tip over Go 1.8.1.

(If you want to test this, note that stage 3 and stage 4 will normally use this too, at least if you're testing by building a Go git version after this commit landed. I don't know of an easy way to disable concurrent compilation only in the bootstrap compiler.)

Sidebar: Typical real and user times, in seconds

Here is a little table of typical wall clock ('real') and user mode times, as reported by time, for building with various different bootstrap compilers. In each table cell, the real time is first, then the user time (which is almost always larger).

bootstrap: Go 1.4 Go 1.8.1 Go tip
stage 1 0.7 / 0.6 1.2 / 1.3 0.8 / 1.4
stage 2 6.6 / 9.8 9.1 / 19.4 4.8 / 19.2
stage 3 7.9 / 15.2 6.8 / 16.1 6.4 / 15.4
stage 4 24.4 / 75.9 11.2 / 84.8 11.6 / 84.5

(The stage 4 numbers between Go 1.8.1 and Go tip are too close to call from run to run. Possibly the stage 3 numbers are as well and I'm basically fooling myself to see a difference.)

Disclaimer: These numbers are not gathered with anything approaching statistical rigor, because I don't have that much energy and make.bash (and cmd/dist) don't make it particularly easy for an outsider to get this sort of data.

For my own memory, if nothing else, all builds were done with everything in /tmp, which is a RAID-0 stripe of two 500 GB Seagate Constellation ST9500620NS SATA drives. With 96 GB, I expect that basically all static data was in kernel disk buffers in RAM all the time, but some things may have been written to disk.

Written on 12 May 2017.
« The challenges of recovering when unpacking archives with damage
People don't patch systems and that's all there is to it »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri May 12 02:39:24 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.