Why your 64-bit Go programs may have a huge virtual size

December 15, 2014

For various reasons, I build (and rebuild) my copy of the core Go system from the latest development source on a regular basis, and periodically rebuild the Go programs I use from that build. Recently I was looking at the memory use of one of my programs with ps and noticed that it had an absolutely huge virtual size (Linux ps's VSZ field) of around 138 GB, although it had only a moderate resident set size. This nearly gave me a heart attack, since a huge virtual size with a relatively tiny resident set size is one classical sign of a memory leak.

(Builds with earlier versions of Go tended to have much more modest virtual set sizes on the order of 32 MB to 128 MB depending on how long it had been running.)

Fortunately this was not a memory leak. In fact, experimentation soon demonstrated that even a basic 'hello world' program had that huge a virtual size. Inspection of the process's /proc/<pid>/smaps file (cf) showed that basically all of the virtual space used was coming from two inaccessible mappings, one roughly 8 GB long and one roughly 128 GB. These mappings had no access permissions (they disallowed reading, writing, and executing) so all they did was reserve address space (without ever using any actual RAM). A lot of address space.

It turns out that this is how Go's current low-level memory management likes to work on 64-bit systems. Simplified somewhat, Go does low level allocations in 8 KB pages taken from a (theoretically) contiguous arena; what pages are free versus allocated is stored in a giant bitmap. On 64-bit machines, Go simply pre-reserves the entire memory address space for both the bitmaps and the arena itself. As the runtime and your Go code starts to actually use memory, pieces of the arena bitmap and the memory arena will be changed from simple address space reservations into memory that is actually backed by RAM and being used for something.

(Mechanically, the bitmap and arena are initially mmap()'d with PROT_NONE. As memory is used, it is remapped with PROT_READ|PROT_WRITE. I'm not confident that I understand what happens when it's freed up, so I'm not going to say anything there.)

All of this is the case for the current post Go 1.4 development version of Go. Go 1.4 and earlier behave differently with much lower virtual sizes for running 64-bit programs, although in reading the Go 1.4 source code I'm not sure I understand why.

As far as I can tell, one of the interesting consequences of this is that 64-bit Go programs can use at most 128 GB of memory for most of their allocations (perhaps all of them that go through the runtime, I'm not sure).

For more details on this, see the comments in src/runtime/malloc2.go and in mallocinit() in src/runtime/malloc1.go.

I have to say that this turned out to be more interesting and educational than I initially expected, even if it means that watching ps is no longer a good way to detect memory leaks in your Go programs (mind you, I'm not sure it ever was). As a result, the best way to check this sort of memory usage is probably some combination of runtime.ReadMemStats() (perhaps exposed through net/http/pprof) and Linux's smem program or the like to obtain detailed information on meaningful memory address space usage.

PS: Unixes are generally smart enough to understand that PROT_NONE mappings will never use up any memory and so shouldn't count against things like system memory overcommit limits. However they generally will count against a per-process limit on total address space, which likely means that you can't really use such limits and run post 1.4 Go programs. Since total address space limits are rarely used, this is probably not likely to be an issue.

Sidebar: How this works on 32-bit systems

The full story is in the mallocinit() comment. The short version is that the runtime reserves a large enough arena to handle 2 GB of memory (which 'only' takes 256 MB) but only reserves 512 MB of address space out of the 2 GB it could theoretically use. If the runtime later needs more memory, it asks the OS for another block of address space and hopes that it is in the remaining 1.5 GB of address space that the arena covers. Under many circumstances the odds are good that the runtime will get what it needs.

Written on 15 December 2014.
« How init wound up as Unix's daemon manager
How a Firefox update just damaged practical security »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Dec 15 01:17:05 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.