Sorting out slice mutability in Go

August 13, 2017

I've recently been writing some Go code that heavily uses and mutates slices, mostly through append(), often slices passed as arguments to functions that then returned what were theoretically different, mutated slices. This left me uncertain that both I and my code were doing the right thing or if I was creating hidden bear traps for myself.

So let's start out with some completely artificial example code:

func test1(h string, hl []string) []string {
  return append(hl, h)
}

// Add h at position p, moving the current
// element there to the end.
func test2(h string, p int, hl []string) []string {
  t := append(hl, hl[p])
  t[p] = h
  return t
}

Suppose you call both of these functions with a []string slice that you want to use later in its current state, ie you want it to not be changed by either call. Is it the case that this will be true for either or both functions?

The answer turns out to be no. Both functions can mutate the original slice in visible ways. Yes, even test1() can have this effect, and here's a demonstration:

func main() {
  t := []string{"a", "b", "c",}
  t2 := test1("d", t)
  t3 := test1("fred", t2)
  _ = test1("barney", t2)
  fmt.Printf("t3[4] is %s\n", t3[4])
}

Although you would expect that t3[4] is "fred", because that's what we appended to t2 to create t3, it is actually "barney" (on the Go Playground, at least, and also on my 64-bit x86 Linux machine).

In Go, slices are a data structure built on top of arrays, as covered in Go Slice: usage and internals. When you generate a slice from an explicit array, the slice is backed by and uses that array. When you work purely with slices (including using append()), as we are here, the resulting slices are backed by anonymous arrays; these anonymous arrays are where the actual data involved in your slice is stored. These anonymous arrays may be shared between slices, and when you copy a slice (for example by calling a function that takes a slice as its argument), you do not create a new copy of the anonymous array that backs it.

Slices have a current length and a maximum capacity that they can grow to. If you call append() on a slice with no capacity left (where len(slice) is cap(slice)), append() has to create a new backing array for the slice and copy all the current elements over into it. However, if you call append() on a slice that has remaining capacity, append() simply uses a bit of the remaining capacity in the underlying backing array; you get a new slice from append(), but it is still using the same backing array. If this backing array is also used by another slice that you care about, problems can ensue.

With test2(), we have a relatively straightforward and obvious case. If append() doesn't have to create a new backing array, we'll mutate the existing one by changing the string at position p. Writing to an existing element of a slice is a clear warning sign here, and it's not too hard to look out for this in your code (and in functions that you call, such as sort.Strings).

With test1() things are more subtle. What is going on here is that when append() doesn't have to increase a slice's capacity, it ends up writing the new element to the original backing array. Our program arranges for t2's anonymous backing array to have spare capacity, so both the second and the third calls to test1() wind up writing to <anonymous-array>[4] and "fred" turns into "barney". This is alarming (at least to me), because I normally think of pure append() calls as being safe; this demonstrates that they are not.

To guard against this, you must always force the creation of a new backing array. The straightforward way to do this is:

func test1(h string, hl []string) []string {
  t := append([]string{}, hl...)
  return append(t, h)
}

(You can reduce this to a one-liner if you want to.)

A version that might be slightly more efficient would explicitly make() a new slice with an extra element's worth of capacity, then copy() the old slice to it, then finally append() or add the new value.

(Whether this version is actually more efficient depends on whether you're going to use plain append() to add even more elements to the new slice later on.)

All of this is a little bit counter-intuitive to me. I could work out what was going on and why it has to be this way, but until I started to really think about it, I thought that test1() was safe. And it sometimes is, which makes things tricky; if t2 had no extra capacity, t3 would have allocated a new backing array and everything would have been fine. When slices backed by anonymous arrays have extra capacity is an implementation detail and depends on both the exact numbers involved and the exact path of slice growth.

(The test1() case is also tricky because the mutation is not visible in the original t2 slice. In test2(), at least the original is clearly altered. In a test1() case, the two append()s to the slice might be quite separated in the code, and the damage is only visible if and when you look at the first new slice.)

PS: This implies that calling append() on the same slice in two different goroutines creates a potential data race, at least if you ever read the newly appended element.


Comments on this page:

By Jeremy at 2017-08-13 07:20:22:

I think it's not so much that append is unsafe, but that its syntax encourages developing the wrong mental model about what it does. For example, if the syntax were t.append(x) or append(&t, x) but with the same copy-only-if-needed behaviour, it would be much more obvious what it's doing.

I'd also suggest that t2 := append(t1, x) is an anti-pattern that should be avoided. In fact, using the := operator with append is a code smell on its own. (Appending to an empty slice to force a copy would be an exception of course). The above syntax suggestions, where append doesn't have a return value, actually prevent this sort of usage.

I'm slightly disappointed that neither go vet or golint seem to complain about your examples.

But once your mental model lines up with the implementation, your concerns mostly go away. If you never assume that copies of the slice are made, you just do it explicitly when you actually mean for it to happen (which would also reduce unnecessary copying). You don't have to worry about multiple goroutines because you already know that mutating anything concurrently is bad.

Your post might make a productive addition to https://github.com/golang/go/wiki/ExperienceReports#slices

By cks at 2017-08-14 10:53:47:

One problem with 't2 := append(...)' being an anti-pattern to look for is that the slice appending is not infrequently hiding behind a type or even an interface. Consider a version of test1() that is:

type hostset []string
func (h hostset) add(host string) hostset {
  return append(h, host)
}

We're starting to abstract things and thus to hide the append(). If the operation is, say, set union, things get more abstracted but you can still get the same problem with hostsets that are just the right size.

By Random at 2017-08-14 12:21:47:

Your functional programming nut friend shrieks in horror at this.

Random

By Jeremy at 2017-08-15 01:51:17:

Yes, adding functions to the mix obscures things a bit, but I would suggest that your example is also something of an anti-pattern. If you write a function (or interface) that takes a slice and returns a slice of the same type, there is ambiguity and you must document whether the slice you return is the same logical one or a new one.

This isn't unique to go slices, it applies to every mutable data structure in languages that allow side-effects.

you can still get the same problem with hostsets that are just the right size

It shouldn't matter whether append copies or modifies in place. If you intend to create a new logical slice, do it every time by copying explicitly. If you don't intend to create a new logical slice, never keep a reference to the old version. The vast majority of the time, you don't intend to create a new logical slice, which is probably why it's not the language default to make a copy.

I'm not here to defend go's implementation. It sucks that the syntax makes it easy to accidentally keep incoherent references to slices. There is a constant overhead to slice handling to make sure it is done correctly. But it's not unsafe if you don't try to pretend it's lisp.

By cks at 2017-08-15 10:25:01:

My off the cuff view is that a Go API that takes and returns slices should normally never return the same logical slice. If you want to manipulate the same logical slice and you need to use append(), your API should work on pointers to slices (and probably be methods on a type that is 'pointer to slice' under the hood, so that this is all opaque to callers). Documenting things and then relying on the caller to always do the right thing works in theory, but it's a foot-gun and I think Go API design should try to avoid them.

(The sole exception is an API where efficiency is so important as to be the overriding consideration.)

I wrote my entry because I didn't even realize this was an issue until I started thinking about it and experimenting. I suspect that I'm not alone in this in the Go programming community, and that there are plenty of people out there with loaded APIs pointing at their feet.

Written on 13 August 2017.
« Notes on cgroups and systemd's interaction with them as of Ubuntu 16.04
Chrome extensions are becoming a reason not to use Chrome »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Aug 13 01:31:31 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.