How Go maps store their values (and keys)
Recently on Twitter I mentioned my entry on why it matters that map values are unaddressable, and got a thought provoking reply from Sean Barrett:
Knowing none of the details & not being a go programmer, I would have guessed that map values aren't addressable because they're in a dynamically-sized hash table so they need to get relocated behind the user's back; getting the address of a value slot would break that.
This made me realize that I actually had no idea how Go maps store their values (and their keys). The answer turned out to be mostly straightforward, because there is a big comment about it in src/runtime/map.go:
// A map is just a hash table. The data is arranged // into an array of buckets. Each bucket contains up to // 8 key/elem pairs. The low-order bits of the hash are // used to select a bucket. Each bucket contains a few // high-order bits of each hash to distinguish the entries // within a single bucket. // // If more than 8 keys hash to a bucket, we chain on // extra buckets.
If Go had generics (it will someday), a Go map bucket could be represented as the following (assuming I'm understanding the code correctly):
type mapbucket[K comparable, V any] struct { [... some internal stuff ...] // the actual keys and values keys [8]K vals [8]V // points to the next overflow bucket overflow *mapbucket[K, V] }
(A given hash table bucket can have an arbitrary number of overflow buckets chained off it, but Go expands the hash table whenever the total number of overflow buckets gets big enough.)
Putting the keys and values into separate arrays allows the Go
runtime to eliminate padding that would be necessary for some maps,
such as a map[int64]int8
(where each single byte of value would
have to be padded to 8-byte boundaries so the 64-bit integer keys
would stay aligned).
I believe that overflow buckets are just allocated on their own,
as stand-alone objects. However, non-overflow buckets are in a big
array, as mentioned in the comment above. This is pointed to by the
Go runtime hmap
struct:
// A header for a Go map. type hmap struct { [...] buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0. [...] }
Now I can talk about the answer to Sean Barrett's guess. In theory Go could allow you to have a pointer to a map value without anything breaking (and array elements are addressable, so there is no language problem with it). However, allowing this and having people use it would have significantly undesirable consequences for memory usage and would also not quite do what people want.
The problem comes about because in Go, interior pointers keep the whole object alive. Here, the object is at least a map bucket of eight keys and eight values (instead of just one value). If the map bucket is an overflow bucket, it's not part of any other object (although it points to anything else in its chain). However, if the map bucket isn't an overflow bucket, it's part of a big array of them. Go giving you a single pointer to a single value in a non-overflow would most likely keep all keys and values in that iteration of the map alive.
This leads to how such pointers wouldn't do what you want. Here's how map.go describes growing the hash tables that are maps:
// When the hashtable grows, we allocate a new array // of buckets twice as big. Buckets are incrementally // copied from the old bucket array to the new bucket array.
When Go copies an old bucket from the old bucket array to the new bucket array, this is a value copy, the same as any other value copy. Naturally, pointers to something inside the old value would not be updated to point to the same thing inside the new value, any more than they would be with any other sort of value copy. This means that your pointer to a map value would unpredictably stop referring to the live version of that value in the current map as the map grew.
|
|