Sorting out slice mutability in Go
I've recently been writing some Go code that heavily uses and mutates
slices, mostly through append()
, often slices passed as arguments to
functions that then returned what were theoretically different, mutated
slices. This left me uncertain that both I and my code were doing the
right thing or if I was creating hidden bear traps for myself.
So let's start out with some completely artificial example code:
func test1(h string, hl []string) []string { return append(hl, h) } // Add h at position p, moving the current // element there to the end. func test2(h string, p int, hl []string) []string { t := append(hl, hl[p]) t[p] = h return t }
Suppose you call both of these functions with a []string
slice
that you want to use later in its current state, ie you want it
to not be changed by either call. Is it the case that this will
be true for either or both functions?
The answer turns out to be no. Both functions can mutate the
original slice in visible ways. Yes, even test1()
can have
this effect, and here's a demonstration:
func main() { t := []string{"a", "b", "c",} t2 := test1("d", t) t3 := test1("fred", t2) _ = test1("barney", t2) fmt.Printf("t3[4] is %s\n", t3[4]) }
Although you would expect that t3[4]
is "fred"
, because that's
what we appended to t2
to create t3
, it is actually "barney"
(on the Go Playground, at least, and
also on my 64-bit x86 Linux machine).
In Go, slices are a data structure built on top of arrays, as covered
in Go Slice: usage and internals. When you
generate a slice from an explicit array, the slice is backed by and
uses that array. When you work purely with slices (including using
append()
), as we are here, the resulting slices are backed by
anonymous arrays; these anonymous arrays are where the actual data
involved in your slice is stored. These anonymous arrays may be
shared between slices, and when you copy a slice (for example by
calling a function that takes a slice as its argument), you do not
create a new copy of the anonymous array that backs it.
Slices have a current length and a maximum capacity that they can
grow to. If you call append()
on a slice with no capacity left
(where len(slice)
is cap(slice)
), append()
has to create a
new backing array for the slice and copy all the current elements
over into it. However, if you call append()
on a slice that has
remaining capacity, append()
simply uses a bit of the remaining
capacity in the underlying backing array; you get a new slice from
append()
, but it is still using the same backing array. If this
backing array is also used by another slice that you care about,
problems can ensue.
With test2()
, we have a relatively straightforward and obvious
case. If append()
doesn't have to create a new backing array,
we'll mutate the existing one by changing the string at position
p
. Writing to an existing element of a slice is a clear warning
sign here, and it's not too hard to look out for this in your code
(and in functions that you call, such as sort.Strings
).
With test1()
things are more subtle. What is going on here is
that when append()
doesn't have to increase a slice's capacity,
it ends up writing the new element to the original backing array.
Our program arranges for t2
's anonymous backing array to have
spare capacity, so both the second and the third calls to test1()
wind up writing to <anonymous-array>[4]
and "fred"
turns into
"barney"
. This is alarming (at least to me), because I normally
think of pure append()
calls as being safe; this demonstrates
that they are not.
To guard against this, you must always force the creation of a new backing array. The straightforward way to do this is:
func test1(h string, hl []string) []string { t := append([]string{}, hl...) return append(t, h) }
(You can reduce this to a one-liner if you want to.)
A version that might be slightly more efficient would explicitly
make()
a new slice with an extra element's worth of capacity,
then copy()
the old slice to it, then finally append()
or add
the new value.
(Whether this version is actually more efficient depends on whether
you're going to use plain append()
to add even more elements to
the new slice later on.)
All of this is a little bit counter-intuitive to me. I could work
out what was going on and why it has to be this way, but until I
started to really think about it, I thought that test1()
was safe.
And it sometimes is, which makes things tricky; if t2
had no extra
capacity, t3
would have allocated a new backing array and everything
would have been fine. When slices backed by anonymous arrays have
extra capacity is an implementation detail and depends on both the
exact numbers involved and the exact path of slice growth.
(The test1()
case is also tricky because the mutation is not visible
in the original t2
slice. In test2()
, at least the original is
clearly altered. In a test1()
case, the two append()
s to the slice
might be quite separated in the code, and the damage is only visible if
and when you look at the first new slice.)
PS: This implies that calling append()
on the same slice in two
different goroutines creates a potential data race, at least if you
ever read the newly appended element.
|
|