Wandering Thoughts


Go will inline functions across packages (under the right circumstances)

Like many languages, or more exactly many compilers for many languages, Go will inline one function into another under the right circumstances. For more on this in general (and examples), see Dave Cheney's Inlining optimisations in Go.

In many languages, only functions within the same source code file or compilation unit are candidates for inlining. Functions that are further away than that (and compiled separately), especially in completely separate packages or libraries, are not available for inlining for various reasons. As I found out recently, modern versions of Go don't work this way, especially with mid-stack inlining. If a function in a different package that you use is simple enough, the Go compiler will quietly inline it into your function.

With Go's mid-stack inlining, there are some very common functions from standard packages that are inlined (probably) in many people's code. One prominent example is fmt.Printf. The actual implementation of fmt.Printf is:

func Printf(format string, a ...interface{}) (n int, err error) {
   return Fprintf(os.Stdout, format, a...)

(You can see it in fmt/print.go.)

This is simple enough to be inlined, and so it generally is. If you write a little test program and build it with the necessary compiler flags (from Dave Cheney's Mid-stack inlining in Go), you can get a report on this:

$ go build -gcflags=-m=2 fred.go
./fred.go:4:14: inlining call to fmt.Printf [...]

(And if you check on Compiler Explorer (aka 'godbolt'), you can verify that the generated assembly matches this.)

PS: I don't know if this inlining extends to internal runtime functions that the compiler generates call to for you, such as converting small integer values to interfaces, or if it only happens for calls that are in your source as you wrote it.

GoInlinesAcrossPackages written at 00:58:07; Add Comment


Go 1.15's interface optimization for small integers is invisible to Go programs

When I wrote about how Go 1.15 improved converting small integer values to interfaces, I said that Go pointing small integer interface values to its special static array of the first 256 integers was similar to what some dynamic languages do. For example, Python effectively interns a bunch of small integers. However, in one important respect what Go is doing is different from what Python and other languages are doing. In Go, this optimization is invisible to normal, proper Go programs, while the equivalent in other languages often is visible in some situations.

The reason this optimization is visible to programs in Python is that Python exposes the actual unique interned objects for small numbers to you. Since you get access to these objects, you can tell when two numbers from two completely different sources are actually the same object, and sometimes this matters. (And since the unique objects are directly exposed to you, they have to be made immutable.)

Go doesn't do this. Go works in values and values are always copied, including when you create interface values from concrete values (even if the concrete value is a pointer). How an interface value holds its copy of the concrete value is invisible to Go programs. When you create an interface value from a concrete value, the interface value takes a copy of the concrete value and stores it somehow. When you get the concrete value back using a type assertion or call a method on the concrete type through the interface value, Go makes a copy of the concrete value held by the interface value and gives it to you (or the method). You never get a reference to the interface value's copy of the concrete value.

Mechanically, Go implements interface values using a pair of pointers (cf), which means that an interface value normally needs to allocate a place to put its copy of the concrete value (which it will then have a pointer to). But you never get access to the 'pointer to the concrete value' part of the interface value in normal Go and so you can never observe that for a small integer, it's pointing into a static array instead of into the heap. Since you can't see these pointers, you also can't see that two different interface values have pointers to the same entry in the static array.

(You can use the unsafe package to crack open the otherwise opaque interface value and pull out the pair of pointers. But then you're not using normal Go.)

Go115InterfaceSmallIntsII written at 01:15:45; Add Comment


How Go 1.15 improved converting small integer values to interfaces

In Go, interface values are famously implemented as a pair of pointers (see Russ Cox's Go Data Structures: Interfaces); a pointer to information about the type and a pointer to the value itself. This generally means that the value must be dynamically allocated in the heap, which means that it will contribute to the work that Go's garbage collection does.

The Go 1.15 release notes mention an intriguing improvement in the runtime section:

Converting a small integer value into an interface value no longer causes allocation.

When I saw that, I immediately wondered how it works, and especially if Go's runtime was now sometimes using the value pointer field in interface values to directly store the value. (There are a number of languages that do this, using various approaches like tag bits to tell values from real pointers.)

The answer turns out to be pretty straightforward, and is in Go CL 216401 (merged in this commit, which may be easier to read). The Go runtime has a special static array of the first 256 integers (0 to 255), and when it would normally have to allocate memory to store an integer on the heap as part of converting it to an interface, it first checks to see if it can just return a pointer to the appropriate element in the array instead. This kind of static allocation of frequently used values is common in languages with lots of dynamic allocation; Python does something similar for small integers, for example (which can sometimes surprise you).

(It turns out that Go previously had an optimization where if you were converting 0 to an interface value, it would return a pointer to a special static zero value. This new optimization for 0-255 replaces that.)

There is one special trick that Go plays here. The actual array is an array of uint64, but it reuses the same array for smaller sized values as well. On little endian systems like x86, this is fine as it stands because a pointer to a 64-bit value is also a valid pointer to that value interpreted as 32 or 16 bits (or 8 bits). But on big endian systems this isn't the case, so if Go is running on a big endian machine it bumps up the pointer so that it works properly (making it point to either the last two bytes or the last four bytes of the 8-byte value).

(On a little endian machine, the pointer is to the low byte of the value and the remaining bytes are all zero so it doesn't matter how many more of them you look at. On a big endian machine, the pointer is to the high byte, but the low byte is the thing that matters.)

As bonus trivia for this change, this new array of 0-255 uint64 values was then reused for avoiding allocating anything for one-byte strings in another change (this commit, CL 221979). Go previously had an array of bytes for this purpose, but why have two arrays. Big endian machines need the same sort of pointer bumping they did for small integers being converted to interface values, but little endian machines can once again use the pointers as is.

PS: There are runtime functions for converting 16, 32, and 64 bit values to interface values, in runtime/iface.go (they can be inlined in actual code), but I was puzzled because there is no runtime function for converting 8 bit values. It turns out that 8-bit values are directly handled by the compiler in walk.go, where it generates inline code that uses the staticuint64s array. This may be done directly in the compiler partly because it needs no fallback path for larger values, unlike the 16, 32, and 64 bit cases, since an 8 bit value will always be in staticuint64s.

Go115InterfaceSmallInts written at 00:39:20; Add Comment


Getting my head around the choice between sleeping and 'tickers'

If you have a program that wants to do something periodically, say once every ten seconds, the obvious and simplest way is to just sleep for ten seconds after each time you do the thing. For example:

while True:

Some languages and libraries also offer an API that I've seen called a 'ticker'; Go, for example, has time.Ticker. A ticker signals or invokes you somehow every time interval, such as every ten seconds. With a ticker, you could write the above (in Go) as something like:

ticker := time.NewTicker(10*time.Second)
for {
  t := <- ticker.C

The big difference between the two is that a ticker is indifferent to how long it took you to do your work. If collecting and publishing device stats takes two seconds, the sleep approach means that you will do this once every twelve seconds (two seconds for the work followed by ten seconds of sleeping). The ticker approach will collect and publish once every ten seconds, because it signals you every ten seconds even if it took you two seconds to collect and publish device stats.

While this may make tickers sound great, this behavior has potential downsides. Which one you want depends partly on what your reasons are for sleeping or delaying. In particular, if you're sleeping to limit the load your actions create, then a ticker may not be what you want, precisely because it's indifferent to how long your work took. If your work takes nine seconds and you sleep afterward for ten, you're keeping things less than 50% busy. If your work takes nine seconds and you have a ten second ticker, you're keeping things 90% busy; one second after you finish your work the ticker will go off again and start it all over.

Tickers are great for things you want to happen every N seconds, even if they take a bunch of time (or a variable amount of time). Sleeping is great for things you want to happen no more often than once every N seconds. However, the difference between the two only matters if what you're doing does (or may) take an appreciable amount of time compared to the tick (or sleep) interval; if it isn't going to, you might as well use whichever API is more idiomatic and convenient, although tickers will probably make things more regular and precise.

(This set of thoughts was sparked by poking around in some Python code that used the sleep() idiom and then thinking about how it would probably use a ticker in Go. (Well, in Go it would have a better API for what it's doing, but apart from that.))

TickersVersusSleeping written at 23:24:41; Add Comment


Using Go build directives to optionally use new APIs in the standard library

I mentioned recently that new APIs in the Go standard library were relatively easy to optionally support, because such new APIs only appear in new Go releases and you can conditionally build files based on the Go release that's building your program. But that's a pretty abstract description, so let's make it concrete.

One of the long time limitations of the crypto/tls package was that it only gave you numbers for TLS cipher suites, not any sort of names, and for logging and reporting things to users you often wanted a name. Go's lack of names for its cipher suite numbers left you to roll your own conversion, generally with a big table of mappings, as I do (or did) in my call program. In Go 1.14, the Go authors fixed this by adding tls.CipherSuiteName(). If I was content to have the latest version of call only build on Go 1.14 or later, I could simply convert it to directly using tls.CipherSuiteName() in place of my current hand done solution. However, for various reasons I would like to keep call building on previous Go versions as well, which means that I need to use my old solution if the new API isn't available.

The first step is to pull out the small function that needs to access the API into a separate file, or create a function to do this if you don't have one already. In my case I might as well call my (new) function cipherSuiteName(). Then we create two versions of this function in two files, let's call them ciphername.go and ciphername_old.go. The second file has the current implementation, for older versions of Go, while the first version has the new implementation that calls tls.CipherSuiteName().

The first file can very simple if all you want to do is directly call the new API and accept its result. It needs a Go build directive to only build on Go 1.14 and later, and can look like this:

// There must be a blank line between the
// build directive and 'package main'.
// +build go1.14

package main

import "crypto/tls"

func cipherSuiteName(id uint16) string {
    return tls.CipherSuiteName(id)

The second file will have a more complicated implementation, which I'm leaving out here, and a Go build directive to not build on Go 1.14 (and later):

// For pre 1.14 Go versions without
// tls.CipherSuiteName().
// +build !go1.14

package main

import "fmt"

func cipherSuiteName(id uint16) string {

My view is that you should use the longer file name for the old implementation, because in the long run you're probably going to delete it (when you stop supporting old Go versions). Depending on what your function in ciphername.go does, you might want to keep it or simply switch over to a direct call to tls.CipherSuiteName().

Useful references for Go build directives are the go/build package documentation and Dave Cheney's How to use conditional compilation with the go build tool.

(This is the kind of entry that I write partly for my later use, because I'm sure I'm going to want to do this for other APIs in the future.)

GoBuildUsingNewAPIs written at 23:40:13; Add Comment


The impact on middleware of expanding APIs with Go's interface smuggling

Recently, the Go blog had Keeping your Modules Compatible which is about doing exactly that as you add features and want to expand your module's API. When the module's API involves interfaces, one of the approaches they suggested is what I've called interface smuggling and what other people have called interface upgrades. Let me quote the article:

When you run into a case where you want to add a method to an existing interface, you may be able to follow this strategy. Start by creating a new interface with your new method, or identify an existing interface with the new method. Next, identify the relevant functions that need to support it, type check for the second interface, and add code that uses it.

This is a quite popular approach, one used by many packages in Go's standard library and third party packages. However, it has a dark side, and that is its unfortunate effects on middleware.

The problem for middleware is best illustrated by the most common sort of middleware, which is things that interpose in the chain of HTTP handlers to modify the results. Much middleware wants to look at or act on some aspect of the HTTP reply, for example to gather metrics based on the result code, which means that it must modify and proxy the http.ResponseWriter passed to child http.Handlers. Over time the http package has acquired a whole collection of smuggled interfaces on ResponseWriters, such as http.CloseNotifier (which is deprecated), http.Flusher, http.Hijacker, and http.Pusher. In the future there will probably be more.

(In addition, the ResponseWriter may or may not support io.ReaderFrom.)

If you're a piece of middleware, the ResponseWriter you're passed may support some, many, or all of these additional APIs. However, Go provides you no good way to pass this support through your proxy ResponseWriter that you're going to pass to children Handlers. The Prometheus people try hard to do it anyway, and the result is rather messy and involves a combinatorial explosion of the possible combinations of APIs. As the case of io.ReaderFrom shows, these additional APIs don't even necessarily come from the http package. A smuggled interface from anywhere may need to be supported.

One answer to this is that you just don't support these additional APIs in your middleware, or you only support a few of them. The problem with this is that the ResponseWriter and the client code that people are trying to use your middleware with well have been developed, tested, and normally used in an environment where these expanded APIs are used, not cut off. As we all know, if you don't test it it doesn't work. Your middleware may be the first code to try to pass the next hop a ResponseWriter with a genuinely narrow API, because such narrow APIs may mostly come from middleware. And of course if there are any bugs in the result, people will blame your middleware.

None of this is insurmountable. But beyond the problems and the hassles, it means that expanding your API with interface smuggling is decidedly not transparent if people use middleware with it. And as a practical matter, some amount of the time your new API will not be usable until middleware is expanded to cope with it (if it ever is).

Another problem is that this expansion of middleware to cope with your new API can't happen until your new API itself is pervasive. Go currently provides no support for conditional building based on the version of other packages or the state of their API, so middleware can't include any use of your new API interfaces until it doesn't have to build against versions of your package that predate them.

(People can work around this for HTTP middleware because they can make files build only on specific minimum versions of Go. Your package doesn't have this magical power; it's something available only for new APIs in the Go standard library.)

Because nothing is new under the sun, this problem was noticed back in 2014's Interface Upgrades in Go, which is one of the earliest places to call this pattern an 'interface upgrade'. The article notes the proxy problem and ends with a call to use interface upgrades sparingly. This is good advice in my opinion, but is very much at odds with the idea of routinely using interface upgrades to expand your API.


GoMiddlewareVsInterfaceSmuggling written at 22:22:14; Add Comment


"Interface smuggling", a Go design pattern for expanding APIs

Interfaces are one of the big ways of creating and defining APIs in Go. Go famously encourages these interfaces to be very minimal; the widely used and implemented io.Reader and io.Writer are each one method. Minimal APIs such as this have the advantage that almost anything can implement them, which means that Go code that accepts an io.Reader or io.Writer can work transparently with a huge range of data sources and destinations.

However, this very simplicity and generality means that these APIs are not necessarily the most efficient way to perform operations. For example, if you want to copy from an io.Reader to an io.Writer, such as io.Copy() does, using only the basic API means that you have to perform intermediate data shuffling when in many cases either the source could directly write to the destination or the destination could directly read from the source. Go's solution to this is what I will call interface smuggling.

In interface smuggling, the actual implementation is augmented with additional well known APIs, such as io.ReaderFrom and io.WriterTo. Functions that want to work more efficiently when possible, such as io.Copy(), attempt to convert the io.Reader or io.Writer they obtained to the relevant API and then use it if the conversion succeeded:

if wt, ok := src.(WriterTo); ok {
   return wt.WriteTo(dst)
if rt, ok := dst.(ReaderFrom); ok {
   return rt.ReadFrom(src)
[... do copy ourselves ...]

I call this interface smuggling because we are effectively smuggling a different, more powerful, and broader API through a limited one. In the case of types supporting io.WriterTo and io.ReaderFrom, io.Copy completely bypasses the nominal API; the .Read() and .Write() methods are never actually used, at least directly by io.Copy (they may be used by the specific implementations of .WriteTo() or .ReadFrom(), or more interface smuggling may take place).

(Go code also sometimes peeks at the concrete types of interface API arguments. This is how under the right circumstances, io.Copy will wind up using the Linux splice(2) or sendfile(2) system calls.)

There is also interface smuggling that expands the API, as seen in things like io.ReadCloser and io.ReadWriteSeeker. If you have a basic io.Reader, you can try to convert it to see if it actually supports these expanded APIs, and then use them if it does.

PS: There's probably a canonical Go term for doing this as part of your API design, either to begin with or as part of expanding it while retaining backward compatibility. If so, feel free to let me know in the comments.

GoInterfaceSmuggling written at 23:37:47; Add Comment


A Go lesson learned: sometimes I don't want to use goroutines if possible

We have a heavily NFS based server environment here, with multiple NFS servers and an IMAP server that accesses all mailboxes over NFS. That IMAP server has had ongoing issues with elevated load averages, and what at least seems to be IMAP slowness. However, our current metrics leave a lot of uncertainties about the effects of all of this, because we basically only have a little bit of performance data for a few IMAP operations. One thing I'd like to do is gather some very basic Unix level NFS performance data from our IMAP server and from some other machines, to see if I can see anything.

One very simple metric is how long it takes to read a little file from every NFS filesystem we have mounted on a machine. As it happens, we already have the little files (they're used for another system management purpose), so all I need is a program to open and read each one while timing how long it takes. There's an obvious issue with doing this sequentially, which is that if there's a single slow filesystem, it could delay everything else.

The obvious answer here was Go, goroutines, and some form of goroutine pool. Because the goroutines just do IO (and they're only being used to avoid one bit of IO delaying another separate bit), the natural size of the goroutine pool is fairly large, say 50 to 100 goroutines (we have a lot of NFS filesystems). This is quite easy and obvious to implement in Go, so I put together a little Go program for it and watched the numbers it generated as they jumped around.

Then, out of reflexive caution, I tried running the same program with a goroutine pool size of 1, which more or less forced serial execution (the pool goroutine infrastructure was still active but there was only one worker goroutine doing all the reading). To my surprise the 'time to read a file' number for all filesystems was visibly and decidedly lower. I could run the program side by side with the two different goroutine pool sizes and see this clearly.

Some thinking gave me a possible reason why this is so. My core code does essentially the following (minus error checking):

start := time.Now()
file, err := os.Open(target)
n, err := file.Read(buffer)
duration := time.Now().Sub(start)

This sequence makes two system calls and each system call is a potential goroutine preemption point. If a goroutine gets preempted during either system call, it can only record the finishing time once it's scheduled again (and finishes the read, if it was preempted in the open). If there are 50 or more goroutines all doing this, some of them could well be preempted and then not scheduled for some time, and that scheduling delay will show up in the final duration. When there aren't multiple goroutines active, there should be very little scheduling delay and the recorded durations (especially the longest durations) will be lower. And the ideal situation for essentially no goroutine contention is of course one goroutine.

(Technically this makes two more system calls to get the time at the start and the end of the sequence, but on modern systems, especially Linux, these don't take long enough to trigger Go's system call preemption and probably don't even enter the kernel itself.)

Because I still worry about individual slow filesystems slowing everything down (or stalls on some filesystems), my solution was a more complicated work pool approach that starts additional worker goroutines only when all of the current ones seem to have stalled for too long. If all goes well (and it generally does in my testing), this runs with only one goroutine.

(My current code has the drawback that once the goroutine worker pool expands, all of them stay active, which means that enough slow filesystems early on in the checks could get me back to the thundering herd problem. I'm still thinking about that issue.)

GoWhenNotManyGoroutines written at 23:52:25; Add Comment


In Go, the compiler needs to know the types of things when copying values

If Go implements generics (which seems likely to happen someday), there will be a lot of interesting generic functions that don't need to do much more with the specific types they're instantiated with other than copy values around (and often allocate slices and their backing arrays). The classical map, filter, and reduce trio all don't need to do anything more themselves than copy values, and the same is true for things like making a slice of the keys of a map. If the compiler is interested in generating only a single set of code for these generic functions (similar to how maps are implemented), one interesting question is how much it needs to know about the values being copied around here. In particular, does the Go runtime need to know their type, or is it enough to know how big they are and then just copy the relevant number of bytes with a memory move?

Unfortunately for us, the answer is that the Go runtime needs to know the types of the values it's copying, even if it's only copying them to already allocated memory. We can discover why by looking at the source for runtime.typedmemmove. The short version of why is that if the runtime is currently doing a garbage collection, all Go code needs to take special precautions when updating pointers. This includes when copying values that are either pointers or composite types that include pointers. When doing a bulk memory move, the runtime needs to know where the pointers are in the destination, and that requires knowing the type of the destination.

(For more, see runtime.bulkBarrierPreWrite.)

The Go runtime also needs to know the type of things when allocating memory for them (such as when creating or expanding a slice). This is because all newly allocated objects must have runtime information set up so that the Go garbage collector knows where any pointers in them are (among other things, this is why unsafe type conversions are still garbage collection safe). Setting up this information requires knowing the type of what is being allocated, because this information on pointers is in the type information.

The exception for both copying values and allocating new memory for them is that if the type contains no pointers, I believe that all the Go runtime needs to know is that and what alignment is required for values. A generic function could thus potentially be compiled into a general version for pointer containing types and a narrower version that only worked for non-pointer types. In practice you would probably just compile a version that was passed a pointer to the type information, because the type information gives you size, alignment, and pointer information all in one place with only a single argument.

(You'll notice that runtime.typedmemmove is just a memmove() if the type doesn't contain any pointers, although some of this is hidden in runtime.cgoCheckMemmove.)

GoValueCopyIsTyped written at 21:59:54; Add Comment


People's efficiency expectations for generics in 'Go 2' and patterns of use

The Go authors have recently come out with a blog entry on the next steps for generics and a new draft design based on interfaces instead of contracts. As it always is, one of the concerns raised in the draft is about the efficiency tradeoffs of any implementation.

Roughly speaking, there are two ways to implement generics. One is to generate fully specialized implementations every time a generic function or type is specialized to a concrete set of types; another is to compile only a single implementation and quietly pass hidden parameters to the implementation so that it can do its work, similar to how interfaces are implemented in Go (and also maps; most of the code of Go maps is generic, not specialized for each type of map). Fully specialized implementations are as fast as the compiler can make them with full knowledge of the types and functions involved, but they take longer to compile and result in more code in the final binary.

In thinking about this, it strikes me that there are two usage patterns (or at least extremes of usage patterns) for generics, based on what code people often write in Go today. I will call these type safe interfaces and single implementations respectively. The type safe interfaces usage pattern would use generics to implement a type safe version of what code is already doing with interface{} or somewhat more restrictive interfaces today. The proposal itself talks about Go using generics to implement type safe versions of things like container/list and sync.Map (here).

The single implementations usage pattern would use generics to condense what today is multiple copies of essentially the same code, specialized to various types, into a single block of code using generic types. These are the people who are tired of writing yet another copy of the function to generate a slice of all of the keys of a map of some new type. Their existing code could in theory be written once using interface{} and a lot of typecasts, but in practice repetition is better than all of the typecasts required (and the resulting possibility of runtime panics), especially since the underlying code is often reasonably simple and short.

People in the type safe interfaces usage pattern probably don't mind the potential speed overheads of a single unspecialized implementation, because they are already paying that penalty today. This does imply that such a generics implementation shouldn't perform worse than the interface based equivalent. People in the single implementations usage pattern are replacing hand specialized Go code with a generics implementation so they can write it only once. Some of them won't be willing to do this if there's a significant performance penalty as a result of such a conversion, and in general these people are willing to pay the compile time and space penalty for specialized implementations because they're already doing so today with their hand specialized code.

(Hopefully the Go compiler can find clever ways to often make the extra cost of unspecialized code very low, similar to how it implements maps efficiently.)

Go2GenericsExpectedEfficiency written at 00:25:27; Add Comment

(Previous 10 or go back to June 2020 at 2020/06/14)

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.