Committed address space versus active anonymous pages in Linux: a mystery
In Linux, there are at least two things that can happen when your system runs out of memory (or the kernel at least thinks it has); the kernel can activate the Out-of-Memory killer, killing one or more processes but leaving the rest alone, or it can start denying new allocation requests, which causes a random assortment of programs to start failing. As I found out recently, systems with strict overcommit on can still trigger the OOM killer, depending on your settings for how much memory the system uses (see here). Normally systems with strict overcommit turned off don't get themselves into situations where they're so out of memory that they start denying allocation requests.
Starting early this morning, some of our compute servers have periodically been reporting 'out of memory, cannot allocate/fork/etc' sorts of errors. There are two things that make this unusual. The first is that these are single-user compute servers, where we turn strict overcommit off; as a result, I would expect them to trigger the OOM killer but never actually run out of memory and start refusing allocations. The second is that according to all of the data I have, these machines have only modest and flat use of committed address space, which is my usual proxy for 'how much memory programs have allocated'.
(The kernel tracks committed address space even when strict overcommit is off, and while it doesn't necessarily represent how much memory programs actually need, it should normally be an upper bound on how much they can use. In fact until today I would have asserted that it definitely was.)
These machines have 96 GB of RAM, and during an incident I can see
the committed address space be constant at 3.7 GB while /proc/meminfo
's
MemAvailable declines to 0 and its Active and Active(anon) numbers
climb up to 90 GB or so. I find this quite mysterious, because as
far as I understand Linux memory accounting, it should be impossible
to have anonymous pages that are not part of the committed address
space. You get anonymous pages by operations such as a MAP_ANONYMOUS
mmap()
, and those are exactly the operations that the kernel is
supposed to carefully account for in working out Committed_AS,
for obvious reasons.
Inspecting /proc/<pid>/smaps
and other data for
the sole gigantic Python process currently running on such a machine
says that it has a resident set size of 91 GB, a significant number
of 'rw-' anonymous mappings (roughly 96 GB worth, mostly in 64
MB mappings), and on hand inspection, a surprising number of those
mappings have a VmFlags: field that does not have the ac
flag
that apparently is associated with an 'accountable area' (per the
proc(5)
manpage
and other documentation). I don't know if not having an ac
flag
causes an anonymous mapping to not count against committed address
space, but it seems plausible, or at least the best theory I currently
have.
(It would help if I could create such mappings myself to test what
happens to the committed address space and so on, but so far I
have only a vague theory that perhaps they can be produced through
use of mremap()
with MAP_PRIVATE
and MREMAP_MAYMOVE
on a
MAP_SHARED
region. This is where I need to write a C test program,
because sadly I don't think I can do this through something like
Python. Python can do a lot of direct OS syscall testing, but playing
around with memory remapping is asking a bit much of it)
|
|