Go interfaces and automatically generated functions

June 16, 2017

I recently read Golang Internals Part 1: Autogenerated functions (and how to get rid of them) (via, and also), which recounts how Minio noticed an autogenerated function in their stack traces that was making an on-stack copy of a structure before calling a method, and worked out how to eliminate this function. Unfortunately, Minio's diagnosis of why this autogenerated function exists at all is not correct (although their solution is the right one). This matters partly because the reason why this autogenerated function exists exposes a real issue you may want to think about in your Go API design.

Let's start at the basics, which in this case is Go methods. Methods have a receiver, and this receiver can either be a value or a pointer. Your choice here of whether your methods have value receivers or pointer receivers matters for your API; see, for example, this article (via). Types also have a method set, which is simply all of the methods that they have. However, there is a special rule for the method sets of pointer types:

The method set of the corresponding pointer type *T is the set of all methods declared with receiver *T or T (that is, it also contains the method set of T).

(The corollary of this is that every regular type T implicitly creates a pointer type *T with all of T's methods, even if you never mention *T in your code or explicitly define any methods for it.)

It's easy to see how this works. Given a *T, you can always call a method on T by simply dereferencing your *T to get a T value, which means that it's trivial to write out a bunch of *T methods that just call the corresponding T methods:

func (p *T) Something(...) (...) {
  v := *p
  return v.Something(...)

Rather than require you to go through the effort of hand-writing all of these methods for all of your *T types, Go auto-generates them for you as necessary. This is exactly the autogenerated function that Minio saw in their stack traces; the underlying real method was cmd.retryStorage.ListDir() (which has a value receiver) and the autogenerated function was cmd.(*retryStorage).ListDir() (which has a pointer receiver, and which did the same dereferencing as our Something example).

But, you might ask, where does the *retryStorage pointer come from? The answer is that it comes from using interface types and values instead of concrete types and values. Here is the relevant bits of the cleanupDir() function that was one step up Minio's stack trace:

func cleanupDir(storage StorageAPI, volume, dirPath string) error {
     entries, err := storage.ListDir(volume, entryPath)

We're making a ListDir() method call on storage, which is of type StorageAPI. This is an interface type, and therefor storage is an interface value. As Russ Cox has covered in his famous article Go Data Structures: Interfaces, interface values are effectively two-pointer structures:

Interface values are represented as a two-word pair giving a pointer to information about the type stored in the interface and a pointer to the associated data.

When we create a StorageAPI interface value from an underlying retryStorage object, the interface value contains a pointer to the object, not the object itself. When we call a function that takes such an interface value as one of its arguments, we wind up passing it a *retryStorage pointer (among other things). As a result, when we call cleanupDir(), we're effectively creating a situation in the code like this:

type magicI struct {
  tab *_typeDef
  ptr *retryStorage

func cleanupDir(storage magicI, ...) error {
    // we're trying to call (*retryStorage).ListDir()
    // since what we have is a pointer, not a value.
    entries, err := storage.ptr.ListDir(...)

Since there is no explicit pointer receiver method (*retryStorage).ListDir() but there is a value receiver method retryStorage.ListDir(), Go calls the autogenerated (*retryStorage).ListDir() method for us (well, for Minio).

This points out an important general rule: calling value receiver methods through interfaces always creates extra copies of your values. Interface values are fundamentally pointers, while your value receiver methods require values; ergo every call requires Go to create a new copy of the value, call your method with it, and then throw the value away. There is no way to avoid this as long as you use value receiver methods and call them through interface values; it's a fundamental requirement of Go.

The conclusion for API design is clear but not necessarily elegant. If your type's methods will always or almost always be called through interface values, you might want to consider using pointer receiver methods instead of value receiver methods even if it's a bit unnatural. Using pointer receiver methods avoids both making a new copy of the value and doing an additional call through the autogenerated conversion method; you go straight to your actual method with no overhead. For obvious reasons, the larger your values are (in terms of the storage they require), the more this matters, because Go has to copy more and more bytes around to create that throwaway value for the method call.

(Of course if you have large types you probably don't want value receiver methods in the first place, regardless of whether or not they wind up being called through interface values. Value receiver methods are best for values that only take up modest amounts of storage, or at least that can be copied around that way.)

Sidebar: How Go passes arguments to functions at the assembly level

In some languages and runtime environments, if you call a function that takes a sufficiently large value as an argument (for example, a large structure), the argument is secretly passed by providing the called function with a pointer to data stored elsewhere instead of writing however many bytes into the stack. Large return values may similarly be returned indirectly (often into a caller-prepared area). At least today, Go is not such a language. All arguments are passed completely on the stack, even if they are large.

This means that Go must always dereference *T pointers into on-stack copies of the value in order to call value receiver T methods. Those T methods fundamentally require their arguments to be on the stack, nowhere else, and this includes the receiver itself (which is passed as a more or less hidden first argument, and things get complicated here).

Written on 16 June 2017.
« The (current) state of Firefox Nightly and old extensions
One reason you have a mysterious Unix file called 2 (or 1) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jun 16 23:48:43 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.