Wandering Thoughts archives


Converting a Go pointer to an integer doesn't quite do what it looks like

Over on r/golang, an interesting question was asked:

[Is it] possible to parse a struct or interface to get its pointer address as an integer? [...]

The practical answer today is yes, as noted in the answers to the question. You can convert any Go pointer to uintptr by going through unsafe.Pointer(), and then convert the uintptr into some more conventional integer type if you want. If you're going to convert to another integer type, you should probably use uint64 for safety, since that should hold any uintptr value on any current Go platform.

However, the theoretical answer is no, in that this conversion doesn't quite get you what you might think it does. What this conversion really gives you is the address that the addressable value had at the moment the conversion to uintptr was done. Go very carefully does not guarantee that this past address is the same as the current address, although it always will be today.

(I'm assuming here that there are other references to the addressable value that keep it from being garbage collected.)

Go's current garbage collector is a non-compacting garbage collector, where once things are allocated somewhere in memory, they never move for as long as they're alive. Since a non-compacting garbage collector has stable memory addresses for things, converting an address to an integer gives you something that is always the integer value of the current address of that thing. However, there are also compacting garbage collectors, which move live things around during garbage collection for various reasons. In these garbage collectors, the memory address of things is not stable.

Go is deliberately specified so that you could implement it using a compacting GC, and at one point this was the long term plan. When it moved things as part of garbage collection, such a Go would update the address of actual pointers to them to the new value. However, it would not magically update integer values derived from those pointers, whether they're uintptrs or some other integer types. In a compacting GC world, getting the uintptr of the address of something twice at different times could give you two different values. Each value was accurate at the moment you got it, but it's not guaranteed to be accurate one instant past that; a GC pass could happen at any time and thus the thing could be moved at any time.

Leaving the door open for a compacting GC is one of the reasons that the rules surrounding the use of unsafe.Pointer() and uintptr are so carefully and narrowly specified, as we've seen before. In fact the documentation points this out explicitly:

A uintptr is an integer, not a reference. Converting a Pointer to a uintptr creates an integer value with no pointer semantics. Even if a uintptr holds the address of some object, the garbage collector will not update that uintptr's value if the object moves, nor will that uintptr keep the object from being reclaimed.

(The emphasis is mine.)

The Go garbage collector never moves things today, which leads to the practical answer for today of 'yes, you can do this'. But the theoretical answer is that the address of things could be constantly changing, and maybe someday in the future they sometimes will.

Update: As pointed out in the r/golang comments on my entry, I'm wrong here. In Go today, stacks are movable as stack usage grows and shrinks, and you can take the address of a value that is on the stack and that subsequently gets moved with the stack.

GoPointerToInteger written at 00:32:58; Add Comment


Some notes on the structure of Go binaries (primarily for ELF)

I'll start with the background. I keep around a bunch of third party programs written in Go, and one of the things that I do periodically is rebuild them, possibly because I've updated some of them to their latest versions. When doing this, it's useful to have a way to report the package that a Go binary was built from, ideally a fast way. I have traditionally used binstale for this, but it's not fast. Recently I tried out gobin, which is fast and looked like it had great promise, except that I discovered it didn't report about all of my binaries. My attempts to fix that resulted in various adventures but only partial success.

All of the following is mostly for ELF format binaries, which is the binary format used on most Unixes (except MacOS). Much of the general information applies to other binary formats that Go supports, but the specifics will be different. For a general introduction to ELF, you can see eg here. Also, all of the following assumes that you haven't stripped the Go binaries, for example by building with '-w' or '-s'.

All Go programs have a .note.go.buildid ELF section that has the build ID (also). If you read the ELF sections of a binary and it doesn't have that, you can give up; either this isn't a Go binary or something deeply weird is going on.

Programs built as Go modules contain an embedded chunk of information about the modules used in building them, including the main program; this can be printed with 'go version -m <program>'. There is no official interface to extract this information from other binaries (inside a program you can use runtime/debug.ReadBuildInfo()), but it's currently stored in the binary's data section as a chunk of plain text. See version.go for how Go itself finds and extracts this information, which is probably going to be reasonably stable (so that newer versions of Go can still run 'go version -m <program>' against programs built with older versions of Go). If you can extract this information from a binary, it's authoritative, and it should always be present even if the binary has been stripped.

If you don't have module information (or don't want to copy version.go's code in order to extract it), the only approach I know to determine the package a binary was built from is to determine the full file path of the source code where main() is, and then reverse engineer that to create a package name (and possibly a module version). The general approach is:

  1. extract Go debug data from the binary and use debug/gosym to create a LineTable and a Table.
  2. look up the main.main function in the table to get its starting address, and then use Table.PCToLine() to get the file name for that starting address.
  3. convert the file name into a package name.

Binaries built from $GOPATH will have file names of the form $GOPATH/src/example.org/fred/cmd/barney/main.go. If you take the directory name of this and take off the $GOPATH/src part, you have the package name this was built from. This includes module-aware builds done in $GOPATH. Binaries built directly from modules with 'go get example.org/fred/cmd/barney@latest' will have a file path of the form $GOPATH/pkg/mod/example.org/fred@v.../cmd/barney/main.go. To convert this to a module name, you have to take off '$GOPATH/pkg/mod/' and move the version to the end if it's not already there. For binaries built outside some $GOPATH, with either module-aware builds or plain builds, you are unfortunately on your own; there is no general way to turn their file names into package names.

(There are a number of hacks if the source is present on your local system; for example, you can try to find out what module or VCS repository it's part of if there's a go.mod or VCS control directory somewhere in its directory tree.)

However, to do this you must first extract the Go debug data from your ELF binary. For ordinary unstripped Go binaries, this debugging information is in the .gopclntab and .gosymtab ELF sections of the binary, and can be read out with debug/elf/File.Section() and Section.Data(). Unfortunately, Go binaries that use cgo do not have these Go ELF sections. As mentioned in Building a better Go linker:

For “cgo” binaries, which may make arbitrary use of C libraries, the Go linker links all of the Go code into a single native object file and then invokes the system linker to produce the final binary.

This linkage obliterates .gopclntab and .gosymtab as separate ELF sections. I believe that their data is still there in the final binary, but I don't know how to extract them. The Go debugger Delve doesn't even try; instead, it uses the general DWARF .debug_line section (or its compressed version), which seems to be more complicated to deal with. Delve has its DWARF code as sub-packages, so perhaps you could reuse them to read and process the DWARF debug line information to do the same thing (as far as I know the file name information is present there too).

Since I have and use several third party cgo-based programs, this is where I gave up. My hacked branch of the which package can deal with most things short of "cgo" binaries, but unfortunately that's not enough to make it useful for me.

(Since I spent some time working through all of this, I want to write it down before I forget it.)

PS: I suspect that this situation will never improve for non-module builds, since the Go developers want everyone to move away from them. For Go module builds, there may someday be a relatively official and supported API for extracting module information from existing binaries, either in the official Go packages or in one of the golang.org/x/ additional packages.

GoBinaryStructureNotes written at 18:35:21; Add Comment


Making your own changes to things that use Go modules

Suppose, not hypothetically, that you have found a useful Go program but when you test it you discover that it has a bug that's a problem for you, and that after you dig into the bug you discover that the problem is actually in a separate package that the program uses. You would like to try to diagnose and fix the bug, at least for your own uses, which requires hacking around in that second package.

In a non-module environment, how you do this is relatively straightforward, although not necessarily elegant. Since building programs just uses what's found in in $GOPATH/src, you can cd directly into your local clone of the second package and start hacking away. If you need to make a pull request, you can create a branch, fork the repo on Github or whatever, add your new fork as an additional remote, and then push your branch to it. If you didn't want to contaminate your main $GOPATH with your changes to the upstream (since they'd be visible to everything you built that used that package), you could work in a separate directory hierarchy and set your $GOPATH when you were working on it.

If the program has been migrated to Go modules, things are not quite as straightforward. You probably don't have a clone of the second package in your $GOPATH, and even if you do, any changes to it will be ignored when you rebuild the program (if you do it in a module-aware way). Instead, you make local changes by using the 'replace' directive of the program's go.mod, and in some ways it's better than the non-module approach.

First you need local clones of both packages. These clones can be a direct clone of the upstream or they can be clones of Github (or Gitlab or etc) forks that you've made. Then, in the program's module, you want to change go.mod to point the second package to your local copy of its repo:

replace github.com/rjeczalik/which => /u/cks/src/scratch/which

You can edit this in directly (as I did when I was working on this) or you can use 'go mod edit'.

If the second package has not been migrated to Go modules, you need to create a go.mod in your local clone (the Go documentation will tell you this if you read all of it). Contrary to what I initially thought, this new go.mod does not need to have the module name of the package you're replacing, but it will probably be most convenient if it does claim to be, eg, github.com/rjeczalik/which, because this means that any commands or tests it has that import the module will use your hacks, instead of quietly building against the unchanged official version (again, assuming that you build them in a module-aware way).

(You don't need a replace line in the second package's go.mod; Go's module handling is smart enough to get this right.)

As an important note, as of Go 1.13 you must do 'go get' to build and install commands from inside this source tree even if it's under $GOPATH. If it's under $GOPATH and you do 'go get <blah>/cmd/gobin', Go does a non-module 'go get' even though the directory tree has a go.mod file and this will use the official version of the second package, not your replacement. This is documented but perhaps surprising.

When you're replacing with a local directory this way, you don't need to commit your changes in the VCS before building the program; in fact, I don't think you even need the directory tree to be a VCS repository. For better or worse, building the program will use the current state of your directory tree (well, both trees), whatever that is.

If you want to see what your module-based binaries were actually built with in order to verify that they're actually using your modified local version, the best tool for this is 'go version -m'. This will show you something like:

go/bin/gobin go1.13
  path github.com/rjeczalik/bin/cmd/gobin
  mod  github.com/rjeczalik/bin    (devel)
  dep  github.com/rjeczalik/which  v0.0.0-2014[...]
  =>    /u/cks/go/src/github.com/siebenmann/which

I believe that the '(devel)' appears if the binary was built directly from inside a source tree, and the '=>' is showing a 'replace' in action. If you build one of the second package's commands (from inside its source tree), 'go version -m' doesn't report the replacement, just that it's a '(devel)' of the module.

(Note that this output doesn't tell us anything about the version of the second package that was actually used to build the binary, except that it was the current state of the filesystem as of the build. The 'v0.0.0-2014[...]' version stamp is for the original version, not our replacement, and comes from the first package's go.mod.)

PS: If 'go version -m' merely reports the 'go1.13' bit, you managed to build the program in a non module-aware way.

Sidebar: Replacing with another repo instead of a directory tree

The syntax for this uses your alternate repository, and I believe it must have some form of version identifier. This version identifier can be a branch, or at least it can start out as a branch in your go.mod, so it looks like this:

replace github.com/rjeczalik/which => github.com/siebenmann/which reliable-find

After you run 'go build' or the like, the go command will quietly rewrite this to refer to the specific current commit on that branch. If you push up a new version of your changes, you need to re-edit your go.mod to say 'reliable-find' or 'master' or the like again.

Your upstream repository doesn't have to have a go.mod file, unlike the case with a local directory tree. If it does have a go.mod, I think that the claimed package name can be relatively liberal (for instance, I think it can be the module that you're replacing). However, some experimentation with sticking in random upstreams suggests that you want the final component of the module name to match (eg, '<something>/which' in my case).

GoHackingWithModules written at 20:49:02; Add Comment


A safety note about using (or having) go.mod inside $GOPATH in Go 1.13

One of the things in the Go 1.13 release notes is a little note about improved support for go.mod. This is worth quoting in more or less full:

The GO111MODULE environment variable continues to default to auto, but the auto setting now activates the module-aware mode of the go command whenever the current working directory contains, or is below a directory containing, a go.mod file — even if the current directory is within GOPATH/src.

The important safety note is that this potentially creates a confusing situation, and also it may be easy for other people to misunderstand what this actually says in the same way that I did.

Suppose that there is a Go program that is part of a module, example.org/fred/cmd/bar (with the module being example.org/fred). If you do 'go get example.org/fred/cmd/bar', you're fetching and building things in non-module mode, and you will wind up with a $GOPATH/src/example.org/fred VCS clone, which will have a go.mod file at its root, ie $GOPATH/src/example.org/fred/go.mod. Despite the fact that there is a go.mod file right there on disk, re-running 'go get example.org/fred/cmd/bar' while you're in (say) your home directory will not do a module-aware build. This is because, as the note says, module-aware builds only happen if your current directory or its parents contain a go.mod file, not just if there happens to be a go.mod file in the package (and module) tree being built. So the only way to do a proper module aware build is to actually be in the command's subdirectory:

cd $GOPATH/src/example.org/fred/cmd/bar
go get

(You can get very odd results if you cd to $GOPATH/src/example.org and then attempt to 'go get example.org/fred/cmd/bar'. The result is sort of module-aware but weird.)

This makes it rather more awkward to build or rebuild Go programs through scripts, especially if they involve various programs that introspect your existing Go binaries. It's also easy to slip up and de-modularize a Go binary; one absent-minded 'go get example.org/...' will do it.

In a way, Go modules don't exist on disk unless you're in their directory tree. If that tree is inside $GOPATH and you're not in it, you have a plain Go package, not a module.

(If the directory tree is outside $GOPATH, well, you're not doing much with it without cd'ing into it, at which point you have a module.)

The easiest way to see whether a binary was built module-aware or not is 'goversion -m PROGRAM'. If the program was built module-aware, you will get a list of all of the modules involved. If it wasn't, you'll just get a report of what Go version it was built with. Also, it turns out that you can build a program with modules without it having a go.mod:

GO111MODULE=on go get rsc.io/goversion@latest

The repository has tags but no go.mod. This also works on repositories with no tags at all. If the program uses outside packages, they too can be non-modular, and 'goversion -m PROGRAM' will (still) produce a report of what tags, dates, and hashes they were at.

Update: in Go 1.13, 'go version -m PROGRAM' also reports the module build information, with module hashes included as well.

This does mean that in theory you could switch over to building all third party Go programs you use this way. If the program hasn't converted to modules you get more or less the same results as today, and if the program has converted, you get their hopefully stable go.mod settings. You'd lose having a local copy of everything in your $GOPATH, though, which opens up some issues.

Go113AndGoModInGOPATH written at 23:55:53; Add Comment


Jumping backward and forward in GNU Emacs

In my recent entry on writing Go with Emacs's lsp-mode, I noted that lsp-mode or more accurately lsp-ui has a 'peek' feature that winds up letting you jump to a definition or a reference of a thing, but I didn't know how to jump back to where you were before. The straightforward but limited answer to my question is that jumping back from a LSP peek is done with the M-, keybinding (which is surprisingly awkward to write about in text). This is not a special LSP key binding and function; instead it is a standard binding that runs xref-pop-marker-stack, which is part of GNU Emacs' standard xref package. This M-, binding is right next to the standard M-. and M-? xref bindings for jumping to definitions and references. It also works with go-mode's godef-jump function and its C-c C-j key binding.

(Lsp-ui doesn't set up any bindings for its 'peek' functions, but if you like what the 'peek' feature does in general you probably want to bind them to M-. and M-? in the lsp-ui-mode-map keybindings so that they take over from the xref versions. The xref versions still work in lsp-mode, it's just that they aren't as spiffy. This is convenient because it means that the standard xref binding 'C-x 4 .' can be used to immediately jump to a definition in another Emacs-level 'window'.)

I call this the limited answer for a couple of reasons. First, this only works in one direction; once you've jumped back, there is no general way to go forward again. You get to remember yourself what you did to jump forward and then do it again, which is easy if you jumped to a definition but not so straightforward if you jumped to a reference. Second, this isn't a general feature; it's specific to the xref package and to things that deliberately go out of their way to hook into it, which includes lsp-ui and go-mode. Because Emacs is ultimately a big ball of mud, any particular 'jump to thing' operation from any particular may or may not hook into the xref marker stack.

(A core Emacs concept is the mark, but core mark(s) are not directly tied to the xref marker stack. It's usually the case that things that use the xref marker stack will also push an entry onto the plain mark ring, but this is up to the whims of the package author. The plain mark ring is also context dependent on just what happened, with no universal 'jump back to where I was' operation. If you moved within a file you can return with C-u C-space, but if you moved to a different file you need to use C-x C-space instead. Using the wrong one gets bad results. M-, is universal in that it doesn't matter whether you moved within your current file or moved to another one, you always jump backward with the same key.)

The closest thing I've found in GNU Emacs to a browser style backwards and forwards navigation is a third party package called backward-forward (also gitlab). This specifically attempts to implement universal jumping in both directions, and it seems to work pretty well. Unfortunately its ring of navigation is global, not per (Emacs) window, but for my use this isn't fatal; I'm generally using Emacs within a single context anyway, rather than having several things at once the way I do in browsers.

Because I want browser style navigation, I've changed from the default backward-forward key bindings by removing its C-left and C-right bindings in favor of M-left and M-right (ie Alt-left and Alt-right, the standard browser key bindings for Back and Forward), and also added bindings for my mouse rocker buttons. How I have it set up so that it works on Fedora and Ubuntu 18.04 is as follows (using use-package, as everyone seems to these days):

(use-package backward-forward
  (backward-forward-mode t)
  :bind (:map backward-forward-mode-map
              ("<C-left>" . nil)
              ("<C-right>" . nil)
              ("<M-left>" . backward-forward-previous-location)
              ("<M-right>" . backward-forward-next-location)
              ("<mouse-8>" . backward-forward-previous-location)
              ("<mouse-9>" . backward-forward-next-location)

(The use-package :demand is necessary on Ubuntu 18.04 to get the key bindings to work. I don't know enough about Emacs to understand why.)

PS: Normal Emacs and Lisp people would probably stack those stray )'s at the end of the last real line. One of my peculiarities in ELisp is that I don't; I would rather see a clear signal of where blocks end, rather than lose track of them in a stack of ')))'. Perhaps I will change this in time.

(In credit where credit is due, George Hartzell pointed out xref-pop-marker-stack to me in email in response to my first entry, which later led to me finding backward-forward.)

EmacsBackForward written at 22:48:45; Add Comment


Go modules and the problem of noticing updates to dependencies

Now that Go 1.13 has been released, we're moving that much closer to a module-based Go world. I've become cautiously but broadly positive towards Go 1.13 and this shift (somewhat in contrast to what I expected earlier), and I'm probably going to switch over to Go 1.13 everywhere and move toward modules in my own work. Thinking about working in this environment has left me with some questions.

Let's suppose that you have some programs or code that uses third party packages, and these are generally stable programs that don't really need any development or change. In the Go module world, the version of those packages that you use is locked down by your go.mod file and won't change unless you manually update, even if new versions are released. In theory you can keep on using your current versions forever, but in practice as a matter of good maintenance hygiene you probably want to update every so often to pick up bug fixes, improvements, and perhaps security updates. As always, updating regularly also makes the changes smaller and easier to deal with if there are problems.

In the pre-module world, how I found out about such updates was that I ran Go-Package-Store every so often and looked at what it reported (I could also use gostatus). I also had (and have) tools like binstale and gobin, which I could use with scripting to basically 'go get -u' everything I currently had a Go binary for (which makes some of the problems from my old entry on using Go-Package-Store not applicable any more).

I'm not sure how to do this in a world of Go modules. Go-Package-Store works by scanning your $GOPATH, but with modules the only things there are (perhaps) your actual programs, not their dependencies. You can see updates for the dependencies of any particular program or module with 'go list -u -m all' (in a cryptic format; anything with a '[...]' after it has an update to that version available), but I don't think anyone has built anything to do a large scale scan, try to find out what the changes are, and show them to you.

(The current module behavior of 'go get', 'go list', and company also seems surprising to me in some areas that complicate interpreting 'go list -u -m all' output, although perhaps it's working as intended.)

Relying on Go modules also brings up a related issue of what to do if the upstream source just goes away. In the pre-module world, you have a full VCS clone of the upstream in your $GOPATH/src, so you can turn around and re-publish it somewhere yourself (or someone else can and you know you can trust their version because it's the same as your local copy). In the module world you only have a snapshot of a specific version (or versions) in your $GOPATH/pkg/mod tree or in the Go module proxy you're using. Even if you vendor things as well, you're not going to have the transparent and full version history of the original package that you do today, and the lack of that history will make it harder to do various things to recover from a disappearing or abruptly changed package.

(I'm a cautious sysadmin who has been around for a while. I've seen all sorts of repositories just disappear one day for all sorts of different reasons.)

(Perhaps someday there will be a Go module proxy that deliberately makes a full VCS clone when you request a module.)

GoModuleNoticingUpdates written at 00:20:35; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.