Wandering Thoughts archives

2007-11-07

Understanding the virtual memory overcommit issue

First, a definition: the committed address space is the total amount of virtual memory that the kernel might have to supply real memory pages for, either in swap space or in RAM. In other words, this is how much memory the kernel has committed to supplying to programs if they all decide to touch all of the memory they've requested from the kernel.

(This is less than the total amount of virtual memory used in the system, since some things, like program code and memory mapped files, don't need swap space.)

In the old days, how much committed address space a Unix kernel would give out was simple but limited: the amount of swap space you had. When people starting moving beyond this, they ran into two issues:

  • the kernel needs some amount of memory for itself in order to operate.

  • programs do not necessarily use all of the memory that they've requested from the kernel, especially when the request is sort of implicit (such as when a process fork()s).

If we could ignore both issues, the committed address space the kernel should give out would be simple: the sum of physical memory plus swap space. Since we can't, the question is how much should we adjust the number for each issue. Unfortunately both issues are unpredictable and depend on what you're doing with your system and on how cautious you need to be about never hitting a situation where the kernel has overcommitted memory, so there is no universal answer, only heuristics and tuning knobs, and the various Unixes have wound up making different choices.

Note that these are choices. While people sometimes argue back and forth about them, the overall problem is a hard one and there is no universal right answer for what committed address space limit to use and how to behave in the face of overcommit.

Sidebar: the results of running out

If the kernel runs into its limit on committed address space it starts giving errors when asked to do operations that require more, so programs stop being able to do things like malloc() memory or fork() or start new processes with big writeable data areas. If the kernel discovers that it has overcommitted itself it is generally forced to start killing processes when they try to use pages of memory that the kernel can't actually supply at the moment.

(Sometimes the kernel winds up in a worse situation, if for example it needs memory for its own use but can't get it. This can lock up an entire machine instead of just killing processes.)

Programmers and system administrators generally prefer the former to the latter; it is a lot easier to cope with malloc() failing than random processes getting abruptly killed. At the same time they want failures to only happen when the system is genuinely out of memory, not when the kernel is just being conservative.

unix/MemoryOvercommit written at 23:23:15;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.