2017-04-30
Some more feelings on nondeterministic garbage collection
A while back I wrote an entry about the problem with nondeterministic garbage collection, more or less as part of my views at the time on PyPy. In that entry I was fairly down on nondeterministic GC. I still feel more or less that way about PyPy's garbage collection. Yet at the same time I use and like Go (and I did back then), which very definitely has nondeterministic garbage collection, and I don't find it to be a problem or something that annoys me. When I was revisiting this recently, I found myself wondering what the difference is. Is it just that I like Go enough that I'm unconsciously forgiving it this?
I don't think it's that simple. Instead I think it comes down to what I could call the culture of the language but instead is better described as 'how people write code in practice'. CPython has always had a deterministic garbage collector with prompt garbage collection, and as a result people wrote plenty of code that assumes that behavior and will do various degrees of unfortunate things if it's run in an environment, like PyPy, that violates that assumption. In practice Python programmers have developed and routinely use plenty of idioms that more or less assume deterministic GC; this code may be 'incorrect' in some sense, but it's also common and normal.
(It is correct code for CPython in practice, in that it works and is efficient to write and so on.)
By contrast, Go had nondeterministic GC from the beginning and people have been coding with that in mind from the start. One partial consequence of this is that Go APIs are often carefully designed so that you can mostly avoid allocations if you want to go to the effort, with caller-supplied reusable buffers and so on. Writing such code is even pretty natural and obvious in Go, in a way that it isn't in Python. I'm pretty sure that Go's features, APIs, and coding style have all been shaped by it having nondeterministic GC, in ways that hasn't happened for Python because CPython had deterministic GC.
I also suspect that nondeterministic GC simply works better in a language that's explicitly designed to create less memory and object churn. Go has any number of language and compiler features that are partly designed to reduce memory pressure, things like unboxed array members, unboxed variables in general, and escape analysis (to enable cheap stack allocation of values).
(Static typing helps here too, but that's something that has reasons well beyond reducing memory pressure.)
PS: I don't have any directly comparable programs, but in operation this Go program seems to have about the same memory usage as this Python program, based on RSS. They aren't seeing the same load and don't quite do the same thing, but they're as close as I can get unless I get very energetic and rewrite DWiki in Go.
Do we want to standardize the size of our root filesystems on servers?
We install many of our Linux servers with mirrored system disks, and at the moment our standard partitioning is to have a 1 GB swap partition and then give the rest of the space to the root filesystem. In light of the complexity of shrinking even a RAID-1 swap partition, whose contents I could casually destroy, an obvious question came to me: did we want to switch to having our root filesystems normally being a standard size, say 80 GB, with the rest of the disk space left unused?
The argument for doing this is that it makes replacing dead system disks much less of a hassle than it is today, because almost any SATA disk we have lying around would do. Today, if a system disk breaks we need to find a disk of the same size or larger to replace it with, and we may not have a same-sized disk (we recycle random disks a lot), so we may wind up with weird mismatched disks with odd partitioning. An 80 GB root filesystem is good enough for basically any of our Linux servers; even with lots of packages and so on installed, they just don't need much space (we don't seem to have any that are using over about 45 GB of space, and that's including a bunch of saved syslog logs and so on).
The main argument against doing this is that this hasn't been a problem so far and there are some potential uses for having lots of spare space in the root filesystem. I admit that this may not sound too persuasive now that I write it down, but honestly 'this is not a real problem for us' is a valid argument. If we were going to pick a standard root filesystem size we'd have to figure out what it should be, monitor the growth of our as-installed root filesystems over time (and over new Ubuntu versions), maybe reconsider the size every so often, and so on. We'd probably want to actually calculate what minimum disk size we're going to get in the future and set the root filesystem size based on that, which implies doing some research and discussion. All of this adds up to kind of a hassle (and having people spend time on this does cost money, at least theoretically).
Given that it's not impossible to shrink an extN filesystem if we have to and that we usually default to using the smallest size of disks in our collection for new system disks, leaving our practices as they are is both pretty safe and what I expect we'll do.
(We also seem to only rarely lose (mirrored) system disks, even when they're relatively old disks. That may change in the future, or maybe not as we theoretically migrate to SSDs for system disks. Our practical migration is, well, not that far along, for reasons beyond the scope of this entry.)