How Linux handles virtual memory overcommit
Following up yesterday's entry on the general background of system virtual memory limits, here's how
Linux deals with this issue. In its traditional way, Linux gives you
three options for what happens when a process tries to allocate some
more memory, controlled by the value of the vm.overcommit_memory
sysctl:
- the kernel gives you the memory unless it thinks you would clearly
overcommit the system (mode 0, the default, 'heuristic overcommit').
- the kernel always gives you the memory (mode 1, 'always overcommit').
- the kernel refuses to give you more memory if it would take the committed address space over the commit limit (mode 2, what I call 'strict overcommit').
(Disclaimer: all of this assumes a relatively recent 2.6 kernel.)
The kernel's commit limit is your swap space plus some percentage of
real memory. You set the percentage with the vm.overcommit_ratio
sysctl, which lets you deal with both complications of a simple commit
limit of swap space plus real memory. (The percentage can be more than
100, for a situation where you have lots of programs that don't use much
of their allocated space.)
Whether or not it is enforcing it, the kernel always tracks the amount
of committed address space and reports it as Committed_AS
in
/proc/meminfo
, along with CommitLimit
, the current commit limit.
For both heuristic overcommit and strict overcommit, the kernel reserves a certain amount of memory for root. In heuristic mode, this is 1/32nd of free RAM; in strict overcommit mode it is 1/32nd of the percent of real memory that you set. This is hard-coded and not tunable, and I can't say I was entirely pleased to discover that our 64 GB compute server is reserving around 2 GB for root.
If you want the gory details, see the __vm_enough_memory
function in mm/mmap.c
in the kernel source, and also
Documentation/vm/overcommit-accounting, which sort of documents the
sysctl settings.
Sidebar: How heuristic overcommit works
Heuristic overcommit attempts to work out how much memory the system could give you if it reclaimed all the memory it could and no other process used more RAM than it currently is; if you are asking for more than this, your allocation is refused. In specific, the theoretical 'free memory' number is calculated by adding up free swap space, free RAM (less 1/32nd if you are not root), and all space used by the unified buffer cache and kernel data that is labeled as reclaimable (less some reserved pages).
|
|