Wandering Thoughts archives

2017-08-13

Sorting out slice mutability in Go

I've recently been writing some Go code that heavily uses and mutates slices, mostly through append(), often slices passed as arguments to functions that then returned what were theoretically different, mutated slices. This left me uncertain that both I and my code were doing the right thing or if I was creating hidden bear traps for myself.

So let's start out with some completely artificial example code:

func test1(h string, hl []string) []string {
  return append(hl, h)
}

// Add h at position p, moving the current
// element there to the end.
func test2(h string, p int, hl []string) []string {
  t := append(hl, hl[p])
  t[p] = h
  return t
}

Suppose you call both of these functions with a []string slice that you want to use later in its current state, ie you want it to not be changed by either call. Is it the case that this will be true for either or both functions?

The answer turns out to be no. Both functions can mutate the original slice in visible ways. Yes, even test1() can have this effect, and here's a demonstration:

func main() {
  t := []string{"a", "b", "c",}
  t2 := test1("d", t)
  t3 := test1("fred", t2)
  _ = test1("barney", t2)
  fmt.Printf("t3[4] is %s\n", t3[4])
}

Although you would expect that t3[4] is "fred", because that's what we appended to t2 to create t3, it is actually "barney" (on the Go Playground, at least, and also on my 64-bit x86 Linux machine).

In Go, slices are a data structure built on top of arrays, as covered in Go Slice: usage and internals. When you generate a slice from an explicit array, the slice is backed by and uses that array. When you work purely with slices (including using append()), as we are here, the resulting slices are backed by anonymous arrays; these anonymous arrays are where the actual data involved in your slice is stored. These anonymous arrays may be shared between slices, and when you copy a slice (for example by calling a function that takes a slice as its argument), you do not create a new copy of the anonymous array that backs it.

Slices have a current length and a maximum capacity that they can grow to. If you call append() on a slice with no capacity left (where len(slice) is cap(slice)), append() has to create a new backing array for the slice and copy all the current elements over into it. However, if you call append() on a slice that has remaining capacity, append() simply uses a bit of the remaining capacity in the underlying backing array; you get a new slice from append(), but it is still using the same backing array. If this backing array is also used by another slice that you care about, problems can ensue.

With test2(), we have a relatively straightforward and obvious case. If append() doesn't have to create a new backing array, we'll mutate the existing one by changing the string at position p. Writing to an existing element of a slice is a clear warning sign here, and it's not too hard to look out for this in your code (and in functions that you call, such as sort.Strings).

With test1() things are more subtle. What is going on here is that when append() doesn't have to increase a slice's capacity, it ends up writing the new element to the original backing array. Our program arranges for t2's anonymous backing array to have spare capacity, so both the second and the third calls to test1() wind up writing to <anonymous-array>[4] and "fred" turns into "barney". This is alarming (at least to me), because I normally think of pure append() calls as being safe; this demonstrates that they are not.

To guard against this, you must always force the creation of a new backing array. The straightforward way to do this is:

func test1(h string, hl []string) []string {
  t := append([]string{}, hl...)
  return append(t, h)
}

(You can reduce this to a one-liner if you want to.)

A version that might be slightly more efficient would explicitly make() a new slice with an extra element's worth of capacity, then copy() the old slice to it, then finally append() or add the new value.

(Whether this version is actually more efficient depends on whether you're going to use plain append() to add even more elements to the new slice later on.)

All of this is a little bit counter-intuitive to me. I could work out what was going on and why it has to be this way, but until I started to really think about it, I thought that test1() was safe. And it sometimes is, which makes things tricky; if t2 had no extra capacity, t3 would have allocated a new backing array and everything would have been fine. When slices backed by anonymous arrays have extra capacity is an implementation detail and depends on both the exact numbers involved and the exact path of slice growth.

(The test1() case is also tricky because the mutation is not visible in the original t2 slice. In test2(), at least the original is clearly altered. In a test1() case, the two append()s to the slice might be quite separated in the code, and the damage is only visible if and when you look at the first new slice.)

PS: This implies that calling append() on the same slice in two different goroutines creates a potential data race, at least if you ever read the newly appended element.

GoSliceMutability written at 01:31:31; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.