Wandering Thoughts archives

2014-12-22

Why Go's big virtual size for 64-bit programs makes sense

In reaction to my entry on why your 64-bit Go programs are going to have a huge virtual size, sgoody on Hacker News asked why Go does this. There are two answers, depending on what 'this' you're talking about.

The reason that Go allocates all of this virtual memory address space is that it keeps other code from accidentally occupying some of it. This might be C runtime libraries that Go code winds up using or it might be Go code (yours or from packages) that calls mmap() directly or indirectly, and someday it will also be because of dynamically loaded Go code. If the Go runtime didn't explicitly fence off the address range it wanted to use, it could at least have the amount of memory it can allocate reduced by other people camping on bits of it. This is essentially standard practice; if you're going to want some chunk of address space later, you might as well fence it off now.

The reason to use this scheme for low level memory allocation is likely because it's simple, and simple is generally fast for memory allocators. Being fast is good here not just for the obvious reason, but also because this is a central low-level allocator and Go is a concurrent environment. You're probably going to have to take locks to use a central allocator, so the shorter the locks are held for the better. A fast central allocator is one that's unlikely to become a bottleneck and point of locking contention.

There are a number of reasons for Go not to just call the C runtime's malloc() to get memory (either at a low level or at a high one). Malloc is a very general allocator and as a result it may do all sorts of things that you don't need and don't want. It's also probably going to perform worse than a tuned custom allocator; in fact having your own allocator is extremely common in language runtimes, even for things like Python. Using it also means that you depend on the C runtime library, which is a problem when cross-compiling Go programs.

(Current versions of Go are also trying to do as much in Go code as possible instead of in C, partly because it makes the life of the garbage collector simpler.)

GoBigVirtualSizeWhy written at 00:38:35; Add Comment

2014-12-15

Why your 64-bit Go programs may have a huge virtual size

For various reasons, I build (and rebuild) my copy of the core Go system from the latest development source on a regular basis, and periodically rebuild the Go programs I use from that build. Recently I was looking at the memory use of one of my programs with ps and noticed that it had an absolutely huge virtual size (Linux ps's VSZ field) of around 138 GB, although it had only a moderate resident set size. This nearly gave me a heart attack, since a huge virtual size with a relatively tiny resident set size is one classical sign of a memory leak.

(Builds with earlier versions of Go tended to have much more modest virtual set sizes on the order of 32 MB to 128 MB depending on how long it had been running.)

Fortunately this was not a memory leak. In fact, experimentation soon demonstrated that even a basic 'hello world' program had that huge a virtual size. Inspection of the process's /proc/<pid>/smaps file (cf) showed that basically all of the virtual space used was coming from two inaccessible mappings, one roughly 8 GB long and one roughly 128 GB. These mappings had no access permissions (they disallowed reading, writing, and executing) so all they did was reserve address space (without ever using any actual RAM). A lot of address space.

It turns out that this is how Go's current low-level memory management likes to work on 64-bit systems. Simplified somewhat, Go does low level allocations in 8 KB pages taken from a (theoretically) contiguous arena; what pages are free versus allocated is stored in a giant bitmap. On 64-bit machines, Go simply pre-reserves the entire memory address space for both the bitmaps and the arena itself. As the runtime and your Go code starts to actually use memory, pieces of the arena bitmap and the memory arena will be changed from simple address space reservations into memory that is actually backed by RAM and being used for something.

(Mechanically, the bitmap and arena are initially mmap()'d with PROT_NONE. As memory is used, it is remapped with PROT_READ|PROT_WRITE. I'm not confident that I understand what happens when it's freed up, so I'm not going to say anything there.)

All of this is the case for the current post Go 1.4 development version of Go. Go 1.4 and earlier behave differently with much lower virtual sizes for running 64-bit programs, although in reading the Go 1.4 source code I'm not sure I understand why.

As far as I can tell, one of the interesting consequences of this is that 64-bit Go programs can use at most 128 GB of memory for most of their allocations (perhaps all of them that go through the runtime, I'm not sure).

For more details on this, see the comments in src/runtime/malloc2.go and in mallocinit() in src/runtime/malloc1.go.

I have to say that this turned out to be more interesting and educational than I initially expected, even if it means that watching ps is no longer a good way to detect memory leaks in your Go programs (mind you, I'm not sure it ever was). As a result, the best way to check this sort of memory usage is probably some combination of runtime.ReadMemStats() (perhaps exposed through net/http/pprof) and Linux's smem program or the like to obtain detailed information on meaningful memory address space usage.

PS: Unixes are generally smart enough to understand that PROT_NONE mappings will never use up any memory and so shouldn't count against things like system memory overcommit limits. However they generally will count against a per-process limit on total address space, which likely means that you can't really use such limits and run post 1.4 Go programs. Since total address space limits are rarely used, this is probably not likely to be an issue.

Sidebar: How this works on 32-bit systems

The full story is in the mallocinit() comment. The short version is that the runtime reserves a large enough arena to handle 2 GB of memory (which 'only' takes 256 MB) but only reserves 512 MB of address space out of the 2 GB it could theoretically use. If the runtime later needs more memory, it asks the OS for another block of address space and hopes that it is in the remaining 1.5 GB of address space that the arena covers. Under many circumstances the odds are good that the runtime will get what it needs.

GoBigVirtualSize written at 01:17:05; Add Comment

2014-12-09

Why I do unit tests from inside my modules, not outside them

In reading about how to do unit testing, one of the divisions I've run into is between people who believe that you should unit test your code strictly through its external API boundaries and people who will unit test code 'inside' the module itself, taking advantage of internal features and so on. The usual arguments I've seen for doing unit tests from outside the module are that your API working is what people really care about and this avoids coupling your tests too closely to your implementation, so that you don't have the friction of needing to revise tests if you revise the internals. I don't follow this view; I write my unit tests inside my modules, although of course I test the public API as much as possible.

The primary reason why I want to test from the inside is that this gives me much richer and more direct access to the internal operation of my code. To me, a good set of unit tests involves strongly testing hypotheses about how the code behaves. It is not enough to show that it works for some cases and then call it a day; I want to also poke the dark corners and the error cases. The problem with going through the public API for this is that it is an indirect way of testing things down in the depths of my code. In order to reach down far enough, I must put together a carefully contrived scenario that I know reaches through the public API to reach the actual code I want to test (and in the specific way I want to test it). This is extra work, it's often hard and requires extremely artificial setups, and it still leaves my tests closely coupled to the actual implementation of my module code. Forcing myself to work through the API alone is basically testing theater.

(It's also somewhat dangerous because the coupling of my tests to the module's implementation is now far less obvious. If I change the module implementation without changing the tests, the tests may well still keep passing but they'll no longer be testing what I think they are. Oops.)

Testing from inside the module avoids all of this. I can directly test that internal components of my code work correctly without having to contrive peculiar and fragile scenarios that reach them through the public API. Direct testing of components also lets me immediately zero in on the problem if one of them fails a test, instead of forcing me to work backwards from a cascade of high level API test failures to find the common factor and realize that oh, yeah, a low level routine probably isn't working right. If I change the implementation and my tests break, that's okay; in a way I want them to break so that I can revise them to test what's important about the new implementation.

(I also believe that directly testing internal components is likely to lead to cleaner module code due to needing less magic testing interfaces exposed or semi-exposed in my APIs. If this leads to dirtier testing code, that's fine with me. I strongly believe that my module's public API should not have anything that is primarily there to let me test the code.)

WhyInsideUnitTests written at 00:35:03; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.