A deep dive into the OS memory use of a simple Go program
One of the enduring mysteries of actually using Go programs is understanding how much OS-level memory they use, as opposed to the various Go-level memory metrics exposed by runtime.MemStats. OS level memory use matters because it influences things like how much real memory your program needs and how likely it is to be killed by the OS in a low-memory situation, but there has always been a disconnect between OS level information and Go level information. After researching enough to write about how Go doesn't free heap memory back to the OS, I got sufficiently curious to really dig down into the details of a very simple program and now I'm going to go through them. All of this is for Go 1.11; other Go versions have had different behavior.
Our very simple program is going to do nothing except sit there so that we can examine its memory use:
package main
func main() { var i uint64 for { i++ } }
(It turns out that we could use time.Sleep() to pause without dragging in
extra complications, because it's actually handled directly in
the runtime,
despite it nominally being in the time
package.)
This simple looking program already has a complicated runtime environment, with several system goroutines operating behind the scene. It also has more memory use than you probably expect. Here's what its memory map looks like on my 64-bit Linux machine:
0000000000400000 316K r-x-- memdemo 000000000044f000 432K r---- memdemo 00000000004bb000 12K rw--- memdemo 00000000004be000 124K rw--- [ bss ] 000000c000000000 65536K rw--- [ anon ] 00007efdfc10c000 35264K rw--- [ anon ] 00007ffc088f1000 136K rw--- [ stack ] 00007ffc08933000 12K r---- [ vvar ] 00007ffc08936000 8K r-x-- [ vdso ] ffffffffff600000 4K r-x-- [ vsyscall ] total 101844K
The vvar, vdso, and vsyscall mappings come from the Linux kernel; the '[ stack ]' mapping is the standard process stack created by the Linux kernel, and the first four mappings are all from the program itself (the actual compiled machine code, the read-only data, plain data, and then the zero'd data respectively). Go itself has allocated the two '[ anon ]' mappings in the middle, which are most of the program's memory use; we have one 64 MB mapping at 0x00c000000000 and one 34.4 MB mapping at 0x7efdfc10c000.
(The addresses for some of these mappings will vary from run to run.)
As described in Allocator Wrestling (see also, and), Go allocates heap memory (including the memory for goroutine stacks) in chunks of memory called spans that come from arenas. Arenas are 64 MB in size and are allocated at fixed locations; on 64-bit Linux, they start at 0x00c000000000. So this is our 64 MB mapping; it is the program's first arena, the only one necessary, which handles all normal Go memory allocation.
If we run our program under strace -e trace=%memory
, we'll
discover that the remaining mysterious mapping actually comes
from a number of separate mmap()
calls that the Linux kernel
has merged together into one memory area. Here is the trace for
our program:
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfe33c000 mmap(0xc000000000, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc000000000 mmap(0xc000000000, 67108864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xc000000000 mmap(NULL, 33554432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfc33c000 mmap(NULL, 2162688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfc12c000 mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfc11c000 mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efdfc10c000
So we have, in order, a 256 KB allocation, the 64 MB arena allocated at its fixed address, a 32 MB allocation, a slightly over 2 MB allocation, and two 64 KB allocations. Everything except the arena allocation is allocated at successively lower addresses next to each other and gets merged together into the single mapping starting at 0x7efdfc10c000. All of these allocations are internal allocations from the Go runtime, and I'm going to run down them in order.
The initial 256 KB allocation is for the first chunk of the Go
runtime's area for persistent allocations.
These are runtime things that will never be freed up and which can
be (and are) allocated outside of the regular heap arenas. Various
things are allocated in persistent allocations, and the persistent
allocator mostly works in 256 KB chunks that it gets from the OS.
Our first mmap()
is thus the runtime starting to allocate from
this area, which causes the allocator to get its first chunk from
the OS. The memory for these persistent allocator chunks is mostly
recorded
in runtime.MemStats.OtherSys, although it's not the only thing
that falls into that category and some persistent allocations are
in different categories.
The 32 MB allocation immediately after our first arena is for the
heap allocator's "L2" arena map.
As the comments in runtime/malloc.go note, most 64-bit architectures
(including Linux) have only a single large L2 arena map, which has
to be allocated when the first arena is allocated. The next allocation,
which is 2112 KB or 2 MB plus 64 KB, turns out to be for the
heapArena
structure
for our newly allocated arena. It has two fields; the .bitmap
field is 2 MB in size, and the .spans
field is 64 KB (in 8192
8-byte pointers). This explains the odd size requested.
(If I'm reading the code correctly, the L2 arena map isn't accounted
for in any runtime.MemStats value; this may be a bug. The
heapArena
structure is accounted for in runtime.MemStats.GcSys.)
The final two 64 KB allocations are for the initial version of a
data structure used to keep track of all spans (set up in
recordspan()
)
and the allocation for a data structure (gcBits) that is used in
garbage collection (set up in newArenaMayUnlock()
).
The span tracking structure is accounted for in
runtime.MemStats.OtherSys, while the gcBits stuff is in
runtime.MemStats.GcSys.
As your program uses more memory, I believe that in general you can
expect more arenas to be allocated from the OS, and with each arena
you'll also get another arenaHeap
structure. I believe that the
L2 arena map is only allocated once on 64-bit Unix. You will probably
periodically have larger span data structures and more gcBits
structures allocated, and you will definitely periodically have new
256 KB chunks allocated for persistent allocations.
(There are probably other sources of allocations from the OS in
the Go runtime. Interested parties can search through the source
code for calls to sysAlloc()
, persistentalloc()
, and so on.
In the end everything apart from arenas comes from sysAlloc()
,
but there are often layers of indirection.)
PS: If you want to track down this sort of thing yourself, the
easiest way to do it is to run your test program under gdb, set a
breakpoint on runtime.sysAlloc
, and then use where
every time
the breakpoint is hit. On most Unixes, this is the only low level
runtime function that allocates floating anonymous memory with
mmap()
; you can see this in, for example, the Linux version
of low level memory allocation.
|
|