2007-10-23
How we sized the overcommit ratio
When we set up strict overcommit mode, we had to pick an overcommit ratio, or the alternate way of looking at it, we had to pick how much total address space commitment we allow. Because we first did this for compute servers, we decided to size things so that an active process would be able to use more or less all of the machine's physical memory, plus allow some extra on top to account for system processes that would get pushed into swap.
The logic is relatively straightforward:
- on dedicated compute servers with large amounts of RAM, we can
assume that must-have kernel memory is a negligible amount of
real memory.
- the last thing we want on a compute server is swap thrashing because that will kill performance for
both the job and the system; we would rather have jobs fail
outright.
- we have to assume that processes that ask for a lot of memory will use it; it is the only safe assumption.
- we further assume that there is no such thing as idle jobs;
if they exist, they're running (and thus using their memory,
and will thrash the machine if they wind up in swap).
- there will be some amount of ssh daemons, shells, and so on, but they will not use much memory.
Hence our target total address space commitment is the amount of RAM on the server plus a gigabyte or two to account for both the kernel's memory needs and the idle extra processes that will get shoved off to swap. Allowing more than physical memory does open up the possibility of going into swap thrashing, but it seems better to err on the liberal side just to make sure that people can extract every usable byte of RAM if they want to. (I am pretty sure that our users do not want us to save them from themselves quite that badly.)
(Unfortunately this requires fiddling the overcommit ratio on each machine to make the numbers come out right for its specific amount of RAM. I wish you could specify the total address space commitment as 'real memory plus <X>', where <X> might be negative.)
Our concerns with strict overcommit on our login servers come up precisely where these assumptions start breaking down, and now that I've written them out explicitly I can easily see that. We're probably okay on kernel memory usage, but some of the others are clearly off (eg, that all memory consuming processes are active at once).