Some notes on the structure of Go binaries (primarily for ELF)

September 14, 2019

I'll start with the background. I keep around a bunch of third party programs written in Go, and one of the things that I do periodically is rebuild them, possibly because I've updated some of them to their latest versions. When doing this, it's useful to have a way to report the package that a Go binary was built from, ideally a fast way. I have traditionally used binstale for this, but it's not fast. Recently I tried out gobin, which is fast and looked like it had great promise, except that I discovered it didn't report about all of my binaries. My attempts to fix that resulted in various adventures but only partial success.

All of the following is mostly for ELF format binaries, which is the binary format used on most Unixes (except MacOS). Much of the general information applies to other binary formats that Go supports, but the specifics will be different. For a general introduction to ELF, you can see eg here. Also, all of the following assumes that you haven't stripped the Go binaries, for example by building with '-w' or '-s'.

All Go programs have a .note.go.buildid ELF section that has the build ID (also). If you read the ELF sections of a binary and it doesn't have that, you can give up; either this isn't a Go binary or something deeply weird is going on.

Programs built as Go modules contain an embedded chunk of information about the modules used in building them, including the main program; this can be printed with 'go version -m <program>'. There is no official interface to extract this information from other binaries (inside a program you can use runtime/debug.ReadBuildInfo()), but it's currently stored in the binary's data section as a chunk of plain text. See version.go for how Go itself finds and extracts this information, which is probably going to be reasonably stable (so that newer versions of Go can still run 'go version -m <program>' against programs built with older versions of Go). If you can extract this information from a binary, it's authoritative, and it should always be present even if the binary has been stripped.

If you don't have module information (or don't want to copy version.go's code in order to extract it), the only approach I know to determine the package a binary was built from is to determine the full file path of the source code where main() is, and then reverse engineer that to create a package name (and possibly a module version). The general approach is:

  1. extract Go debug data from the binary and use debug/gosym to create a LineTable and a Table.
  2. look up the main.main function in the table to get its starting address, and then use Table.PCToLine() to get the file name for that starting address.
  3. convert the file name into a package name.

Binaries built from $GOPATH will have file names of the form $GOPATH/src/ If you take the directory name of this and take off the $GOPATH/src part, you have the package name this was built from. This includes module-aware builds done in $GOPATH. Binaries built directly from modules with 'go get' will have a file path of the form $GOPATH/pkg/mod/ To convert this to a module name, you have to take off '$GOPATH/pkg/mod/' and move the version to the end if it's not already there. For binaries built outside some $GOPATH, with either module-aware builds or plain builds, you are unfortunately on your own; there is no general way to turn their file names into package names.

(There are a number of hacks if the source is present on your local system; for example, you can try to find out what module or VCS repository it's part of if there's a go.mod or VCS control directory somewhere in its directory tree.)

However, to do this you must first extract the Go debug data from your ELF binary. For ordinary unstripped Go binaries, this debugging information is in the .gopclntab and .gosymtab ELF sections of the binary, and can be read out with debug/elf/File.Section() and Section.Data(). Unfortunately, Go binaries that use cgo do not have these Go ELF sections. As mentioned in Building a better Go linker:

For “cgo” binaries, which may make arbitrary use of C libraries, the Go linker links all of the Go code into a single native object file and then invokes the system linker to produce the final binary.

This linkage obliterates .gopclntab and .gosymtab as separate ELF sections. I believe that their data is still there in the final binary, but I don't know how to extract them. The Go debugger Delve doesn't even try; instead, it uses the general DWARF .debug_line section (or its compressed version), which seems to be more complicated to deal with. Delve has its DWARF code as sub-packages, so perhaps you could reuse them to read and process the DWARF debug line information to do the same thing (as far as I know the file name information is present there too).

Since I have and use several third party cgo-based programs, this is where I gave up. My hacked branch of the which package can deal with most things short of "cgo" binaries, but unfortunately that's not enough to make it useful for me.

(Since I spent some time working through all of this, I want to write it down before I forget it.)

PS: I suspect that this situation will never improve for non-module builds, since the Go developers want everyone to move away from them. For Go module builds, there may someday be a relatively official and supported API for extracting module information from existing binaries, either in the official Go packages or in one of the additional packages.

Written on 14 September 2019.
« Bidirectional NAT and split horizon DNS in our networking setup
The problem of 'triangular' Network Address Translation »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Sep 14 18:35:21 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.