Wandering Thoughts


Go is gaining the ability to trace init calls on program startup

Go packages can have init() initialization functions, which are called when a Go program starts as part of package initialization. One of the practical issues with init functions in Go so far is that their performance and even their existence is relatively opaque, so that it's hard to tell how much of an impact they have on the startup time of your programs.

The good news is that the Go team is moving to change this lack of visibility, as tracked through this issue and recently landed in the development version of Go (what will become Go 1.16) in this change. To quote the change:

runtime: implement GODEBUG=inittrace=1 support

Setting inittrace=1 causes the runtime to emit a single line to standard error for each package with init work, summarizing the execution time and memory allocation.

The emitted debug information for init functions can be used to find bottlenecks or regressions in Go startup performance.

Somewhat to my surprise, this starts acting early enough that it reports on the init functions even in the runtime package. For me, the consistent first two lines for program startup, present even with a program that does nothing, are:

init internal/bytealg @0 ms, 0 ms clock, 0 bytes, 0 allocs
init runtime @0.062 ms, 0.069 ms clock, 0 bytes, 0 allocs

On the one hand, I think that making init functions more visible is a good thing in general, and will definitely encourage people to make them minimal. On the other hand, I wonder if people seeing a long list of init functions, even in typical programs, will lead to discouraging their use entirely even if the replacement isn't as good (for instance, doing the same work with sync.Once). It's certainly a bit startling to see how many init functions there are in typical Go programs.

(One rule of thumb is that you get what you measure, and reporting init functions is now implicitly measuring them.)

GoTracingInitCalls written at 00:24:46; Add Comment


Go packages can have more than one init() function

Go has some surprisingly complex rules for how packages are initialized, partly because package level variables can be initialized based on the value returned from function and method calls (and then other variables can be initialized from them). As part of package initialization, you can have an initialization function, called init(), that will be called.

Or at least that's what I would have told you before I actually had a reason to read that section of the Go language specification today. In fact, the specification is very clear that you can have more than one init() function in a single package:

Variables may also be initialized using functions named init declared in the package block, with no arguments and no result parameters.

func init() { … }

Multiple such functions may be defined per package, even within a single source file. [...]

(Emphasis mine. Package initialization then has details on what order these init functions are run in.)

At first this surprised me, but once I thought more it makes sense. On a practical engineering level, it means that you don't have to jam all initialization in a package into a single function in a single file that everyone has to touch; you can spread it out in small pieces wherever is logical and clear.

(You do have to keep track of it all, and the order that functions in different files get run in depends on how they're built and linked. The Package initialization section has some suggestions about that down at the bottom, which you probably don't have to worry about if you build things with plain usage of go since it should do it right for you.)

Because I was curious, I scanned the Go source tree itself to see if anything used multiple init functions, especially in the same file. There is definitely a decent amount of usage of this within the same package, and even a few cases in the same file (for example, in cmd/go/main.go). Unsurprisingly, the runtime package is a big user of this, since it covers a lot of functionality; a lot of files in src/runtime have their own init functions to cover their specific concerns.

(However the champion user of init functions is cmd/compile/internal/ssa/gen.)

GoMultipleInitFunctions written at 00:07:32; Add Comment


Where (and how) you limit your concurrency in Go can matter

At the start of September, I wrote about how concurrency is still not easy even in Go, using a section of real code with a deadlock as the example. In that entry, I proposed three fixes to remove the deadlock. Since Hillel Wayne's Finding Goroutine Bugs with TLA+ has now formally demonstrated that all three of my proposed fixes work, I can talk about the practical differences between them.

For convenience, here's the original code from the first entry:

func FindAll() []P {
   pss, err := ps.Processes()
   found := make(chan P)
   limitCh := make(chan struct{}, concurrencyProcesses)

   for _, pr := range pss {
      // deadlocks here:
      limitCh <- struct{}{}
      pr := pr
      go func() {
         defer func() { <-limitCh }()
         [... get a P with some error checking ...]
         // and deadlocks here:
         found <- P

   var results []P
   for p := range found {
      results = append(results, p)
   return results

The buffered limitCh channel is used to implement a limited supply of tokens, to hold down the number of goroutines that are getting P's at once. The bug in this code is that the goroutines only receive from limitCh to release their token after sending their result to the unbuffered found channel, while the main code only starts receiving from found after running through the entire for loop, and the main code takes the token in the loop and blocks if no tokens are available. (For more, see the original entry.)

There are at least three fixes possible: the goroutines can send to limitCh instead of the main function doing it, the goroutines can receive from limitCh before sending to found, or the entire for loop can be in an additional goroutine so that it doesn't block the main function from starting to receive from found. All three of these fixes work, as proven by Hillel Wayne, but they have different effects on the number of goroutines that this code will run if pss is large and what the state of those goroutines is.

If our goal is to minimize resource usage, the worst fix is for goroutines to receive from limitCh before sending to found. This fix will cause almost all goroutines to stall in the send to found, because all but a few of them must be started and run almost to completion before the main code can finish the for loop and start receiving from found to unblock all of those sends and let the goroutines exit. These waiting to send goroutines are keeping used their fully expanded goroutine stacks, and possibly other resources that have not yet been released by them exiting and things becoming unused so the garbage collector can collect them (or by additional defer statements releasing things).

The middling fix is for goroutines to receive from limitCh instead of the for loop doing it. We will probably immediately create and start almost all of the full pss worth of goroutines, which could be bad if pss is very large, but at least they all block immediately with almost no resources used and with very small goroutine stacks. Still, this is a bunch of memory and a bunch of (Go) scheduler churn to start all of those goroutines only to have most of them immediately block receiving from limitCh. There's also going to be a lot of contention on internal runtime locks associated with limitCh, since a lot of goroutines are queueing up on it.

The best fix for resource usage is to push the for loop into its own goroutine but to otherwise keep things the same. Because the for loop is still receiving from limitCh before it creates a new goroutine, the number of simultaneous goroutines we ever have will generally be limited to around our desired concurrency level (there will be some extra that have received from limitCh but not yet finished completely exiting).

It's likely that none of this matters if the for loop only has to deal with a few hundred entries, and that's probably the case for this code (at least most of the time). But it makes for a useful illustration. When you're writing code with enforced limited concurrency it's probably worthwhile to think about where you want to limit the concurrency and what effects that has on things overall. As we can see here, small implementation choices can have potentially large impacts.

(Also, sometimes I think too much about this sort of thing.)

GoConcurrencyLimitsWhere written at 00:57:04; Add Comment


Why I write recursive descent parsers (despite their issues)

Today I read Laurence Tratt's Which Parsing Approach? (via), which has a decent overview of how parsing computer languages (including little domain specific languages) is not quite the well solved problem we'd like it to be. As part of the article, Tratt discusses how recursive descent parsers have a number of issues in practice and recommends using other things, such as a LR parser generator.

I have a long standing interest in parsing, I'm reasonably well aware of the annoyances of recursive descent parsers (although some of the issues Tratt raised hadn't occurred to me before now), and I've been exposed to parser generators like Yacc. Despite that, my normal approach to parsing any new little language for real is to write a recursive descent parser in whatever language I'm using, and Tratt's article is not going to change that. My choice here is for entirely pragmatic reasons, because to me recursive descent parsers generally have two significant advantages over all other real parsers.

The first advantage is that almost always, a recursive descent parser is the only or at least easiest form of parser you can readily create using only the language's standard library and tooling. In particular, parsing LR, LALR, and similar formal grammars generally requires you to find, select, and install a parser generator tool (or more rarely, an additional package). Very few languages ship their standard environment with a parser generator (or a lexer, which is often required in some form by the parser).

(The closest I know of is C on Unix, where you will almost always find some version of lex and yacc. Not entirely coincidentally, I've used lex and yacc to write a parser in C, although a long time ago.)

By contrast, a recursive descent parser is just code in the language. You can obviously write that in any language, and you can build a little lexer to go along with it that's custom fitted to your particular recursive descent parser and your language's needs. This also leads to the second significant advantage, which is that if you write a recursive descent parser, you don't need to learn a new language, the language of the parser generator, and also learn how to hook that new language to the language of your program, and then debug the result. Your entire recursive descent parser (and your entire lexer) are written in one language, the language you're already working in.

If I was routinely working in a language that had a well respected de facto standard parser generator and lexer, and regularly building parsers for little languages for my programs, it would probably be worth mastering these tools. The time and effort required to do so would be more than paid back in the end, and I would probably have a higher quality grammar too (Tratt points out how recursive descent parsers hide ambiguity, for example). But in practice I bounce back and forth between two languages right now (Go and Python, neither of which have such a standard parser ecology), and I don't need to write even a half-baked parser all that often. So writing another recursive descent parser using my standard process for this has been the easiest way to do it every time I needed one.

(I've developed a standard process for writing recursive descent parsers that makes the whole thing pretty mechanical, but that's a discussion for another entry or really a series of them.)

PS: I can't comment about how easy it is to generate good error messages in modern parser generators, because I haven't used any of them. My experience with my own recursive descent parsers is that it's generally straightforward to get decent error messages for the style of languages that I create, and usually simple to tweak the result to give clearer errors in some specific situations (eg, also).

WhyRDParsersForMe written at 00:33:22; Add Comment


When the Go garbage collector will panic over bad pointer values

For some time, I've vaguely remembered that the Go garbage collector actually checked Go pointer values and would panic if it found that an alleged pointer (including unsafe.Pointer values) didn't point to a valid object. Since the garbage collector may interrupt you at almost random points, this would make it very dangerous to play around with improper unsafe.Pointer values. However, this was just a superstitious memory, so today I decided to find out what the situation is in current Go by reading the relevant runtime source code (for the development version of Go, which is just a bit more recent than Go 1.15 as I write this).

As described in Allocator Wrestling (see also, and), Go allocates ordinary things (including goroutine stacks) from chunks of memory called spans that are themselves allocated as part of arenas. Arenas (and spans) represent address space that is used as part of the Go heap, but they may not currently have all of their memory allocated from the operating system. A Go program always has at least one arena created as part of its address space.

Based on reading the code, I believe that the Go garbage collector panics if it finds a Go pointer that points inside a created arena but is not within the bounds of a span that is currently in use (including spans used for stacks). The Go garbage collector completely skips checking pointers that don't fall within a created arena; the comment in the source code says '[t]his pointer may be to some mmap'd region, so we allow it', which might lead you to think that it's talking about your potential use of mmap(), but the Go runtime itself allocates a number of things outside of arenas in things it has mmap()'d and obviously the garbage collector can't panic over pointers to them.

The address space available on 64-bit machines is very large and many Go programs will use only a small portion of it for created arenas. The practical consequence of this is that many random 'pointer' values will not fall within the bounds of your program's arenas and so won't trigger garbage collector panics. You're probably more likely to produce these panics if you start with valid Go pointers and then manipulate them in sufficiently improper ways (but not so improperly that the pointer value flies off too far).

(So my superstitious belief has some grounding in reality but was probably way too broad. It's certainly not safe to put bad values in unsafe.Pointers, but in practice most bad values won't be helpfully diagnosed with panics from the garbage collector; instead you'll get other, much more mysterious issues when you try to use them for real.)

An additional issue is that spans are divided up into objects, not all of which are necessarily allocated at a given time. The current version of the garbage collector doesn't seem to attempt to verify that all pointers point to allocated objects inside spans, so I believe that if you're either lucky or very careful in your unsafe.Pointer manipulation, you can create a non-panicing pointer to a currently free object that will later be allocated and used by someone else.

(It's possible that such a pointer could cause garbage collector panics later on under some circumstances.)

The Go runtime also contains a much simpler pointer validity check (and panic) in the code that handles copying and adjusting goroutine stacks when they have to grow. This simply looks for alleged pointers that have a value that's 'too small' (but larger than 0), where too small is currently 4096. I believe that such bad pointers will pass the garbage collector's check, because they point well outside any created arena.

Both of these panics can be turned off with the same setting in $GODEBUG, as covered in the documentation for the runtime package. As you would expect, the setting you want is 'invalidptr=0'.

People who want to see the code for this should look in runtime/mbitmap.go's findObject(), runtime/mheap.go's spanOf(), and runtime/stack.go's adjustpointers().

GoGCBadPointerPanics written at 00:40:03; Add Comment


Even in Go, concurrency is still not easy (with an example)

Go is famous for making concurrency easy, through good language support for goroutines. Except what Go makes easy is only one level of concurrency, the nuts and bolts level of making your code do things concurrently and communicating back and forth through channels. Making it do the right things concurrently is still up to you, and unfortunately Go doesn't currently provide a lot of standard library support for correctly implemented standard concurrency patterns.

For example, one common need is for a limited amount of concurrency; you want to do several things at once, but only so many of them. At the moment this is up to you to implement on top of goroutines, channels, and things like the sync package. This is not as easy as it looks, and quite competent people can make mistakes here. As it happens, I have an example ready to hand today.

Gops is a convenient command to list (and diagnose) Go processes that are currently running on your system. Among other things, it'll tell you which version of Go they were compiled with, which is handy if you want to see if you have out of date binaries that should be rebuilt and redeployed. One of the things gops needs to do is look at all of the Go processes on your system, which it does concurrently. However, it doesn't want to look at too many processes at once, because that can cause problems with file descriptor limits. This is a classic case of limited concurrency.

Gops implements this at the moment with code in goprocess.FindAll() that looks like this, in somewhat sketched and reduced form:

func FindAll() []P {
   pss, err := ps.Processes()
   found := make(chan P)
   limitCh := make(chan struct{}, concurrencyProcesses)

   for _, pr := range pss {
      limitCh <- struct{}{}
      pr := pr
      go func() {
         defer func() { <-limitCh }()
         [... get a P with some error checking ...]
         found <- P

   var results []P
   for p := range found {
      results = append(results, p)
   return results

(In the real code there's a WaitGroup for coordination, and the found channel gets closed appropriately.)

How this works is clear, and is a standard pattern (covered in eg Go 101's Channel Use Cases). We use a buffered channel to provide a limited number of tokens; sending a value into the channel implicitly takes a token (and blocks if the token supply is exhausted), while receiving a value from it puts a token back in. We take a token before we start a new goroutine, and the goroutine releases the token when it's done.

Except that this code has a bug if there are too many processes to examine. Even knowing that there is a bug in this code, it may not be obvious.

The bug is that the goroutines only receive from limitCh to release their token after sending their result to the unbuffered found channel, while the main code only starts receiving from found after running through the entire loop, and the main code takes the token in the loop and blocks if no tokens are available. So if you have too many processes to go through, you start N goroutines, they all block trying to write to found and don't receive from limitCh, and the main for loop blocks trying to send to limitCh and never reaches the point where it starts receiving from found.

At one level, this bug is a very fragile bug; it only exists because of multiple circumstances. If the goroutines took the token by sending to limitCh instead of the main for loop doing it, the bug would not exist; the main for loop would start them all, many would stop, and then it would go on to receive from found so that they could receive from limitCh and release their token so other goroutines would run. If the goroutines received from limitCh to release their token before sending to found, it wouldn't exist (but because of error handling, it's simpler and more reliable to do the receive in a defer). And if the entire for loop was in an additional goroutine, the main code would go on to receive from found and unblock completed goroutines to release their tokens, so the fact that the for loop was blocked waiting to send to limitCh wouldn't matter.

At another level, this shows how concurrency is not easy as easy as it looks in Go. All you need is one mistake and things skid to a halt, and all of the code involved can look good to a casual examination. Getting concurrency correct is simply hard for people (we can debate about why, but I think that it is is very clear).

(I'm sure that the people who wrote and approved the change that added this concurrency limiting code to gops were good programmers. A tricky case still tripped them up, passing all of their scrutiny. Even when I knew that there was a concurrency problem in the code and where it was (because my gops was hanging all of a sudden, and Delve told me where everything was stuck), it still took me some time to see what the exact problem was.)

GoConcurrencyStillNotEasy written at 23:57:46; Add Comment


An interesting mistake with Go's context package that I (sort of) made

Today, Dave Cheney did another Go pop quiz on Twitter, where he asked whether the following code printed -6, 0, '<nil>', or paniced:

package main
import (

func f(ctx context.Context) {
    context.WithValue(ctx, "foo", -6)

func main() {
    ctx := context.TODO()

I didn't answer this correctly because I focused my attention on the wrong thing.

What I focused on was the use of the "foo" string as the context key, partly because of my experience with languages like Python. To start with, the context package's documentation says:

The provided key must be comparable and should not be of type string or any other built-in type to avoid collisions between packages using context. Users of WithValue should define their own types for keys. [...]

A traditional problem in languages like Python is that two strings may compare the same without actually being the same thing, and some code really wants you to present it with the exact same thing. However, the context package doesn't require that you present it with the exact same key, just a key where the interface value of the key will compare the same.

(Because context compares interface values, both the value and the type must match; it's not enough for both values to have the same underlying concrete type, say string, and to compare identical. This is why defining your own string type is a reliable away around collisions between packages.)

So after I worked through all of this, I confidently answered that this code printed -6. The "foo" string that the value is set with is not necessarily the same "foo" string that it's retrieved with, but that doesn't matter. However, this is not the problem with the code. The actual problem is that context.WithValue() returns a new context with the value set, it doesn't change the context it's called on. Dave Cheney's code is written as if .WithValue() mutates the current context, as f() ignores that new context that .WithValue() provides and returns nothing to main(). Since the original context in main() is what .Value() is called on, it has no "foo" key and the result is actually '<nil>'.

This problem with the code is actually a quite interesting mistake, because as far as I can tell right now none of the usual Go style checkers detect it. This code passes 'go vet', it produces no complaints from errcheck because we're not ignoring an error return value, and tools like golangci-lint only complain about the use of the built-in type string as the key in .WithValue(). Nothing seems to notice that we're ignoring the critical return value from .WithValue(), which turns it into more or less a no-op.

(Now that Dave Cheney has brought this to the surface, I suspect that someone will contribute a check for it to staticcheck, which already detects the 'using a built-in type as a key' issue.)

GoContextValueMistake written at 23:24:14; Add Comment


Go will inline functions across packages (under the right circumstances)

Like many languages, or more exactly many compilers for many languages, Go will inline one function into another under the right circumstances. For more on this in general (and examples), see Dave Cheney's Inlining optimisations in Go.

In many languages, only functions within the same source code file or compilation unit are candidates for inlining. Functions that are further away than that (and compiled separately), especially in completely separate packages or libraries, are not available for inlining for various reasons. As I found out recently, modern versions of Go don't work this way, especially with mid-stack inlining. If a function in a different package that you use is simple enough, the Go compiler will quietly inline it into your function.

With Go's mid-stack inlining, there are some very common functions from standard packages that are inlined (probably) in many people's code. One prominent example is fmt.Printf. The actual implementation of fmt.Printf is:

func Printf(format string, a ...interface{}) (n int, err error) {
   return Fprintf(os.Stdout, format, a...)

(You can see it in fmt/print.go.)

This is simple enough to be inlined, and so it generally is. If you write a little test program and build it with the necessary compiler flags (from Dave Cheney's Mid-stack inlining in Go), you can get a report on this:

$ go build -gcflags=-m=2 fred.go
./fred.go:4:14: inlining call to fmt.Printf [...]

(And if you check on Compiler Explorer (aka 'godbolt'), you can verify that the generated assembly matches this.)

PS: I don't know if this inlining extends to internal runtime functions that the compiler generates call to for you, such as converting small integer values to interfaces, or if it only happens for calls that are in your source as you wrote it.

GoInlinesAcrossPackages written at 00:58:07; Add Comment


Go 1.15's interface optimization for small integers is invisible to Go programs

When I wrote about how Go 1.15 improved converting small integer values to interfaces, I said that Go pointing small integer interface values to its special static array of the first 256 integers was similar to what some dynamic languages do. For example, Python effectively interns a bunch of small integers. However, in one important respect what Go is doing is different from what Python and other languages are doing. In Go, this optimization is invisible to normal, proper Go programs, while the equivalent in other languages often is visible in some situations.

The reason this optimization is visible to programs in Python is that Python exposes the actual unique interned objects for small numbers to you. Since you get access to these objects, you can tell when two numbers from two completely different sources are actually the same object, and sometimes this matters. (And since the unique objects are directly exposed to you, they have to be made immutable.)

Go doesn't do this. Go works in values and values are always copied, including when you create interface values from concrete values (even if the concrete value is a pointer). How an interface value holds its copy of the concrete value is invisible to Go programs. When you create an interface value from a concrete value, the interface value takes a copy of the concrete value and stores it somehow. When you get the concrete value back using a type assertion or call a method on the concrete type through the interface value, Go makes a copy of the concrete value held by the interface value and gives it to you (or the method). You never get a reference to the interface value's copy of the concrete value.

Mechanically, Go implements interface values using a pair of pointers (cf), which means that an interface value normally needs to allocate a place to put its copy of the concrete value (which it will then have a pointer to). But you never get access to the 'pointer to the concrete value' part of the interface value in normal Go and so you can never observe that for a small integer, it's pointing into a static array instead of into the heap. Since you can't see these pointers, you also can't see that two different interface values have pointers to the same entry in the static array.

(You can use the unsafe package to crack open the otherwise opaque interface value and pull out the pair of pointers. But then you're not using normal Go.)

Go115InterfaceSmallIntsII written at 01:15:45; Add Comment


How Go 1.15 improved converting small integer values to interfaces

In Go, interface values are famously implemented as a pair of pointers (see Russ Cox's Go Data Structures: Interfaces); a pointer to information about the type and a pointer to the value itself. This generally means that the value must be dynamically allocated in the heap, which means that it will contribute to the work that Go's garbage collection does.

The Go 1.15 release notes mention an intriguing improvement in the runtime section:

Converting a small integer value into an interface value no longer causes allocation.

When I saw that, I immediately wondered how it works, and especially if Go's runtime was now sometimes using the value pointer field in interface values to directly store the value. (There are a number of languages that do this, using various approaches like tag bits to tell values from real pointers.)

The answer turns out to be pretty straightforward, and is in Go CL 216401 (merged in this commit, which may be easier to read). The Go runtime has a special static array of the first 256 integers (0 to 255), and when it would normally have to allocate memory to store an integer on the heap as part of converting it to an interface, it first checks to see if it can just return a pointer to the appropriate element in the array instead. This kind of static allocation of frequently used values is common in languages with lots of dynamic allocation; Python does something similar for small integers, for example (which can sometimes surprise you).

(It turns out that Go previously had an optimization where if you were converting 0 to an interface value, it would return a pointer to a special static zero value. This new optimization for 0-255 replaces that.)

There is one special trick that Go plays here. The actual array is an array of uint64, but it reuses the same array for smaller sized values as well. On little endian systems like x86, this is fine as it stands because a pointer to a 64-bit value is also a valid pointer to that value interpreted as 32 or 16 bits (or 8 bits). But on big endian systems this isn't the case, so if Go is running on a big endian machine it bumps up the pointer so that it works properly (making it point to either the last two bytes or the last four bytes of the 8-byte value).

(On a little endian machine, the pointer is to the low byte of the value and the remaining bytes are all zero so it doesn't matter how many more of them you look at. On a big endian machine, the pointer is to the high byte, but the low byte is the thing that matters.)

As bonus trivia for this change, this new array of 0-255 uint64 values was then reused for avoiding allocating anything for one-byte strings in another change (this commit, CL 221979). Go previously had an array of bytes for this purpose, but why have two arrays. Big endian machines need the same sort of pointer bumping they did for small integers being converted to interface values, but little endian machines can once again use the pointers as is.

PS: There are runtime functions for converting 16, 32, and 64 bit values to interface values, in runtime/iface.go (they can be inlined in actual code), but I was puzzled because there is no runtime function for converting 8 bit values. It turns out that 8-bit values are directly handled by the compiler in walk.go, where it generates inline code that uses the staticuint64s array. This may be done directly in the compiler partly because it needs no fallback path for larger values, unlike the 16, 32, and 64 bit cases, since an 8 bit value will always be in staticuint64s.

Go115InterfaceSmallInts written at 00:39:20; Add Comment

(Previous 10 or go back to August 2020 at 2020/08/01)

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.