2014-12-22
Why Go's big virtual size for 64-bit programs makes sense
In reaction to my entry on why your 64-bit Go programs are going to have a huge virtual size, sgoody on Hacker News asked why Go does this. There are two answers, depending on what 'this' you're talking about.
The reason that Go allocates all of this virtual memory address space
is that it keeps other code from accidentally occupying some of
it. This might be C runtime libraries that Go code winds up using or it might be Go code (yours or from packages)
that calls mmap() directly or indirectly, and someday it will also
be because of dynamically loaded Go code. If the Go runtime didn't
explicitly fence off the address range it wanted to use, it could at
least have the amount of memory it can allocate reduced by other people
camping on bits of it. This is essentially standard practice; if you're
going to want some chunk of address space later, you might as well fence
it off now.
The reason to use this scheme for low level memory allocation is likely because it's simple, and simple is generally fast for memory allocators. Being fast is good here not just for the obvious reason, but also because this is a central low-level allocator and Go is a concurrent environment. You're probably going to have to take locks to use a central allocator, so the shorter the locks are held for the better. A fast central allocator is one that's unlikely to become a bottleneck and point of locking contention.
There are a number of reasons for Go not to just call the C runtime's
malloc() to get memory (either at a low level or at a high one).
Malloc is a very general allocator and as a result it may do all sorts
of things that you don't need and don't want. It's also probably
going to perform worse than a tuned custom allocator; in fact having
your own allocator is extremely common in language runtimes, even
for things like Python. Using it also means that you depend on the C
runtime library, which is a problem when cross-compiling Go programs.
(Current versions of Go are also trying to do as much in Go code as possible instead of in C, partly because it makes the life of the garbage collector simpler.)
2014-12-15
Why your 64-bit Go programs may have a huge virtual size
For various reasons, I build (and rebuild) my copy of the core Go
system from the latest development source on a regular basis, and
periodically rebuild the Go programs I use from that build. Recently
I was looking at the memory use of one of my programs with ps and noticed that
it had an absolutely huge virtual size (Linux ps's VSZ field)
of around 138 GB, although it had only a moderate resident set size.
This nearly gave me a heart attack, since a huge virtual size with
a relatively tiny resident set size is one classical sign of a
memory leak.
(Builds with earlier versions of Go tended to have much more modest virtual set sizes on the order of 32 MB to 128 MB depending on how long it had been running.)
Fortunately this was not a memory leak. In fact, experimentation
soon demonstrated that even a basic 'hello world' program had that
huge a virtual size. Inspection of the process's /proc/<pid>/smaps
file (cf) showed that basically all of the
virtual space used was coming from two inaccessible mappings, one
roughly 8 GB long and one roughly 128 GB. These mappings had no
access permissions (they disallowed reading, writing, and executing)
so all they did was reserve address space (without ever using any
actual RAM). A lot of address space.
It turns out that this is how Go's current low-level memory management likes to work on 64-bit systems. Simplified somewhat, Go does low level allocations in 8 KB pages taken from a (theoretically) contiguous arena; what pages are free versus allocated is stored in a giant bitmap. On 64-bit machines, Go simply pre-reserves the entire memory address space for both the bitmaps and the arena itself. As the runtime and your Go code starts to actually use memory, pieces of the arena bitmap and the memory arena will be changed from simple address space reservations into memory that is actually backed by RAM and being used for something.
(Mechanically, the bitmap and arena are initially mmap()'d with
PROT_NONE. As memory is used, it is remapped with
PROT_READ|PROT_WRITE. I'm not confident that I understand what
happens when it's freed up, so I'm not going to say anything there.)
All of this is the case for the current post Go 1.4 development version of Go. Go 1.4 and earlier behave differently with much lower virtual sizes for running 64-bit programs, although in reading the Go 1.4 source code I'm not sure I understand why.
As far as I can tell, one of the interesting consequences of this is that 64-bit Go programs can use at most 128 GB of memory for most of their allocations (perhaps all of them that go through the runtime, I'm not sure).
For more details on this, see the comments in src/runtime/malloc2.go
and in mallocinit() in src/runtime/malloc1.go.
I have to say that this turned out to be more interesting and
educational than I initially expected, even if it means that watching
ps is no longer a good way to detect memory leaks in your Go
programs (mind you, I'm not sure it ever was). As a result, the
best way to check this sort of memory usage is probably some
combination of runtime.ReadMemStats() (perhaps exposed through
net/http/pprof) and Linux's
smem program or the like to obtain detailed information on
meaningful memory address space usage.
PS: Unixes are generally smart enough to understand that PROT_NONE
mappings will never use up any memory and so shouldn't count against
things like system memory overcommit limits. However they generally
will count against a per-process limit on total address space, which
likely means that you can't really use such limits and run post 1.4
Go programs. Since total address space limits are rarely used, this
is probably not likely to be an issue.
Sidebar: How this works on 32-bit systems
The full story is in the mallocinit() comment. The short version
is that the runtime reserves a large enough arena to handle 2 GB
of memory (which 'only' takes 256 MB) but only reserves 512 MB of
address space out of the 2 GB it could theoretically use. If the
runtime later needs more memory, it asks the OS for another block
of address space and hopes that it is in the remaining 1.5 GB of
address space that the arena covers. Under many circumstances the
odds are good that the runtime will get what it needs.
2014-12-09
Why I do unit tests from inside my modules, not outside them
In reading about how to do unit testing, one of the divisions I've run into is between people who believe that you should unit test your code strictly through its external API boundaries and people who will unit test code 'inside' the module itself, taking advantage of internal features and so on. The usual arguments I've seen for doing unit tests from outside the module are that your API working is what people really care about and this avoids coupling your tests too closely to your implementation, so that you don't have the friction of needing to revise tests if you revise the internals. I don't follow this view; I write my unit tests inside my modules, although of course I test the public API as much as possible.
The primary reason why I want to test from the inside is that this gives me much richer and more direct access to the internal operation of my code. To me, a good set of unit tests involves strongly testing hypotheses about how the code behaves. It is not enough to show that it works for some cases and then call it a day; I want to also poke the dark corners and the error cases. The problem with going through the public API for this is that it is an indirect way of testing things down in the depths of my code. In order to reach down far enough, I must put together a carefully contrived scenario that I know reaches through the public API to reach the actual code I want to test (and in the specific way I want to test it). This is extra work, it's often hard and requires extremely artificial setups, and it still leaves my tests closely coupled to the actual implementation of my module code. Forcing myself to work through the API alone is basically testing theater.
(It's also somewhat dangerous because the coupling of my tests to the module's implementation is now far less obvious. If I change the module implementation without changing the tests, the tests may well still keep passing but they'll no longer be testing what I think they are. Oops.)
Testing from inside the module avoids all of this. I can directly test that internal components of my code work correctly without having to contrive peculiar and fragile scenarios that reach them through the public API. Direct testing of components also lets me immediately zero in on the problem if one of them fails a test, instead of forcing me to work backwards from a cascade of high level API test failures to find the common factor and realize that oh, yeah, a low level routine probably isn't working right. If I change the implementation and my tests break, that's okay; in a way I want them to break so that I can revise them to test what's important about the new implementation.
(I also believe that directly testing internal components is likely to lead to cleaner module code due to needing less magic testing interfaces exposed or semi-exposed in my APIs. If this leads to dirtier testing code, that's fine with me. I strongly believe that my module's public API should not have anything that is primarily there to let me test the code.)