Why your 64-bit Go programs may have a huge virtual size
For various reasons, I build (and rebuild) my copy of the core Go
system from the latest development source on a regular basis, and
periodically rebuild the Go programs I use from that build. Recently
I was looking at the memory use of one of my programs with ps
and noticed that
it had an absolutely huge virtual size (Linux ps's VSZ
field)
of around 138 GB, although it had only a moderate resident set size.
This nearly gave me a heart attack, since a huge virtual size with
a relatively tiny resident set size is one classical sign of a
memory leak.
(Builds with earlier versions of Go tended to have much more modest virtual set sizes on the order of 32 MB to 128 MB depending on how long it had been running.)
Fortunately this was not a memory leak. In fact, experimentation
soon demonstrated that even a basic 'hello world' program had that
huge a virtual size. Inspection of the process's /proc/<pid>/smaps
file (cf) showed that basically all of the
virtual space used was coming from two inaccessible mappings, one
roughly 8 GB long and one roughly 128 GB. These mappings had no
access permissions (they disallowed reading, writing, and executing)
so all they did was reserve address space (without ever using any
actual RAM). A lot of address space.
It turns out that this is how Go's current low-level memory management likes to work on 64-bit systems. Simplified somewhat, Go does low level allocations in 8 KB pages taken from a (theoretically) contiguous arena; what pages are free versus allocated is stored in a giant bitmap. On 64-bit machines, Go simply pre-reserves the entire memory address space for both the bitmaps and the arena itself. As the runtime and your Go code starts to actually use memory, pieces of the arena bitmap and the memory arena will be changed from simple address space reservations into memory that is actually backed by RAM and being used for something.
(Mechanically, the bitmap and arena are initially mmap()
'd with
PROT_NONE
. As memory is used, it is remapped with
PROT_READ|PROT_WRITE
. I'm not confident that I understand what
happens when it's freed up, so I'm not going to say anything there.)
All of this is the case for the current post Go 1.4 development version of Go. Go 1.4 and earlier behave differently with much lower virtual sizes for running 64-bit programs, although in reading the Go 1.4 source code I'm not sure I understand why.
As far as I can tell, one of the interesting consequences of this is that 64-bit Go programs can use at most 128 GB of memory for most of their allocations (perhaps all of them that go through the runtime, I'm not sure).
For more details on this, see the comments in src/runtime/malloc2.go
and in mallocinit()
in src/runtime/malloc1.go.
I have to say that this turned out to be more interesting and
educational than I initially expected, even if it means that watching
ps
is no longer a good way to detect memory leaks in your Go
programs (mind you, I'm not sure it ever was). As a result, the
best way to check this sort of memory usage is probably some
combination of runtime.ReadMemStats()
(perhaps exposed through
net/http/pprof) and Linux's
smem
program or the like to obtain detailed information on
meaningful memory address space usage.
PS: Unixes are generally smart enough to understand that PROT_NONE
mappings will never use up any memory and so shouldn't count against
things like system memory overcommit limits. However they generally
will count against a per-process limit on total address space, which
likely means that you can't really use such limits and run post 1.4
Go programs. Since total address space limits are rarely used, this
is probably not likely to be an issue.
Sidebar: How this works on 32-bit systems
The full story is in the mallocinit()
comment. The short version
is that the runtime reserves a large enough arena to handle 2 GB
of memory (which 'only' takes 256 MB) but only reserves 512 MB of
address space out of the 2 GB it could theoretically use. If the
runtime later needs more memory, it asks the OS for another block
of address space and hopes that it is in the remaining 1.5 GB of
address space that the arena covers. Under many circumstances the
odds are good that the runtime will get what it needs.
|
|