Committed address space versus active anonymous pages in Linux: a mystery

May 12, 2019

In Linux, there are at least two things that can happen when your system runs out of memory (or the kernel at least thinks it has); the kernel can activate the Out-of-Memory killer, killing one or more processes but leaving the rest alone, or it can start denying new allocation requests, which causes a random assortment of programs to start failing. As I found out recently, systems with strict overcommit on can still trigger the OOM killer, depending on your settings for how much memory the system uses (see here). Normally systems with strict overcommit turned off don't get themselves into situations where they're so out of memory that they start denying allocation requests.

Starting early this morning, some of our compute servers have periodically been reporting 'out of memory, cannot allocate/fork/etc' sorts of errors. There are two things that make this unusual. The first is that these are single-user compute servers, where we turn strict overcommit off; as a result, I would expect them to trigger the OOM killer but never actually run out of memory and start refusing allocations. The second is that according to all of the data I have, these machines have only modest and flat use of committed address space, which is my usual proxy for 'how much memory programs have allocated'.

(The kernel tracks committed address space even when strict overcommit is off, and while it doesn't necessarily represent how much memory programs actually need, it should normally be an upper bound on how much they can use. In fact until today I would have asserted that it definitely was.)

These machines have 96 GB of RAM, and during an incident I can see the committed address space be constant at 3.7 GB while /proc/meminfo's MemAvailable declines to 0 and its Active and Active(anon) numbers climb up to 90 GB or so. I find this quite mysterious, because as far as I understand Linux memory accounting, it should be impossible to have anonymous pages that are not part of the committed address space. You get anonymous pages by operations such as a MAP_ANONYMOUS mmap(), and those are exactly the operations that the kernel is supposed to carefully account for in working out Committed_AS, for obvious reasons.

Inspecting /proc/<pid>/smaps and other data for the sole gigantic Python process currently running on such a machine says that it has a resident set size of 91 GB, a significant number of 'rw-' anonymous mappings (roughly 96 GB worth, mostly in 64 MB mappings), and on hand inspection, a surprising number of those mappings have a VmFlags: field that does not have the ac flag that apparently is associated with an 'accountable area' (per the proc(5) manpage and other documentation). I don't know if not having an ac flag causes an anonymous mapping to not count against committed address space, but it seems plausible, or at least the best theory I currently have.

(It would help if I could create such mappings myself to test what happens to the committed address space and so on, but so far I have only a vague theory that perhaps they can be produced through use of mremap() with MAP_PRIVATE and MREMAP_MAYMOVE on a MAP_SHARED region. This is where I need to write a C test program, because sadly I don't think I can do this through something like Python. Python can do a lot of direct OS syscall testing, but playing around with memory remapping is asking a bit much of it)

Written on 12 May 2019.
« Some thoughts on Red Hat Enterprise 8 including Python 2 and what it means
What we'll want in a new Let's Encrypt client »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun May 12 00:22:15 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.