The Go language's problem on 32-bit machines
Recently (for my value of recently) there was somewhat of a commotion of people declaring that Go wasn't usable in production on 32-bit systems because its garbage collection was broken and it would eat all of your memory. Naturally I was interested in this and spent some time digging in to the reports and trying to understand the situation. Today I'm going to try to write down as much as I know about what's going on to get it straight in my head, which is going to involve a trip into the fun land of garbage collection.
To simplify a bit, the purpose of garbage collection is to automatically free up memory that's no longer used. The GC technique everyone starts with is reference counting but since it has various problems (including dealing with circular references) most people soon upgrade to more complex schemes based on inverting the problem: rather than noticing when something stops being used, the garbage collection system periodically finds all of the memory that's still actively used and then frees everything else. This is 'tracing garbage collection' (and garbage collectors), so called because the garbage collector 'traces' all live objects.
One deep but unsexy problem in garbage collection is how your GC system knows what fields in your objects refer to other objects and what fields are just primitive types like numbers, memory buffers, strings, or the like, and how it does this efficiently. This can be a particular issue for a system language where you probably want to have structures and objects that are as simple and dense as possible, with as little overhead from type annotations, inefficient 'boxed' representations, and so on as possible. One solution is to maintain a separate bitmap of what words in an allocated memory area are actually pointers (which the GC can then scan efficiently, and which can be set by the runtime when an object is allocated). Another solution is what gets called 'conservative garbage collection'. The fundamental idea is that in conservative GC, we are willing to over-estimate references (and thus wind up not freeing some unused memory); rather than insisting on knowing about references, the GC system simply scans through allocated memory looking anything that might be a pointer to an allocated object. If it finds one, it conservatively declares that the object is still alive and traces things from there.
Go was initially designed as a system language, although it's no longer described as one. As such, one of the tradeoffs the language designers made is that Go more or less uses conservative garbage collection, as far as I understand, at least for objects or at least memory areas that may contain pointers (some static data that's known to be pointer free may be skipped by the conservative GC). Although there's said to be the start of a more efficient word-bitmap implementation for Go objects, it's not currently usable by the GC (and may not be fully live).
(As far as I can tell from commentary, Go's garbage collector only scans Go's own memory areas; it doesn't make any attempt to scan memory used by outside libraries or code to find references to Go objects. Runtime code that passes a pointer to a Go object to an outside function is apparently required to keep the object alive inside Go, for example by hooking it into a global variable.)
The problem with conservative GC is that it over-estimates memory still in use because it finds false 'references', things that look like pointers to allocated objects that aren't actually that. There are a number of factors that make conservative GC worse:
- the more of your address space is in use for language objects, the more random values can look like references to them. If half of the address space is your objects, half of all properly aligned N-bit patterns look like pointers to your objects (where N is the size of a pointer).
- the smaller the address space is in general, the more of it you're
going to fill up with your objects for the same amount of memory use.
Two GB of objects is half of the 32-bit address space but a tiny
fraction of the 64-bit address space.
- the larger your individual objects are, the more memory a single 'reference' somewhere inside one will prevent from being freed.
- similarly, the more other objects a single object refers to, the more memory will be held down by a single spurious reference to the top object.
Many of these factors are apparently quite bad for 32-bit Go programs that use a significant amount of memory, apparently especially for large objects and when they use objects that the garbage collector treats conservatively. They are drastically reduced on 64-bit machines, where you would generally have to be unlucky in order for the conservative GC to accidentally hold a significant amount of memory busy. However, the problem could still happen with 64-bit Go; it's just less likely.
(The general reference for this is Go language issue 909.)
At this point I have no articulate personal reactions to all of this. As a pragmatic matter I'm not exactly writing Go programs right now for various reasons (although I keep vaguely wanting to because I like Go in the abstract), so if I'm being honest it's all kind of theoretical.
(My problem with Go in practice is partly that I have nothing to really use it on. I need to find a project that calls out for it instead of anything else.)
Sidebar: the 32-bit Windows issue
There's also an issue on Windows machines due to memory fragmentation (via Hacker News). When it starts, the Go runtime tries to allocate a contiguous 512 Mbyte region of virtual address space. Sometimes on Windows machines enough DLLs have loaded in enough places by this point that there isn't such a contiguous chunk of address space left any more, the allocation fails, and the Go runtime immediately exits with an error.
(In theory this sort of address space fragmentation could happen on any 32-bit OS, but apparently Windows is uniquely susceptible for various reasons.)