Limiting a process's memory usage on Linux

September 13, 2007

Due to recent events I have become interested in this issue, so I have been poking around and doing some experiments. Unfortunately, while Linux has a bewildering variety of memory related per-process resource limits that you can set, most of them don't work or don't do you any good.

What you have, in theory and practice:

  • ulimit -m, the maximum RSS, doesn't do anything; the kernel maintains the number but never seems to use it for anything.

  • ulimit -d, the maximum data segment size, is effectively useless since it only affects memory that the program obtains through brk(2)/sbrk(2). These days, these aren't used very much; GNU libc does most of its memory allocation using mmap(), especially for big blocks of memory.

  • ulimit -v, the maximum size of the address space, works but affects all mmap()s, even of things that will never require swap space, such as mmap()ing a big file.

What I really want is something that can effectively limit a process's 'committed address space' (to use the term that /proc/meminfo and the kernel documentation on swap overcommit use). I don't care if a process wants to mmap() a 50 gigabyte file, but I care a lot if it wants 50G of anonymous, unbacked address space, because the latter is what will drive the system into out-of-memory.

Unfortunately I can imagine entirely legitimate reasons to want to mmap() huge files (especially huge sparse files) on a 64-bit machine, so any limit on the total process address space on our compute servers will have to be a soft limit.

Since the Linux kernel already tracks committed address space information for the whole system, it's possible that it would not be too much work to extend it to a per-process limit. (The likely fly in the ointment is that memory regions can be shared between processes, which complicates the accounting and raises questions about what you do when a process modifies a virtual memory region in a way that is legal for it but pushes another process sharing the VMA over its limit.)


Comments on this page:

From 65.172.155.230 at 2007-09-14 01:45:15:

So I think you want is a PSS limit from: http://lwn.net/Articles/230975/ ... which, if someone gets PSS in, should be easier than RSS ... maybe. :).

Generally everyone seems to just go for ulimit -v, and not care about the mmap() large file users ... but I'm guessing you know that.

Another thing I'd been thinking about, which might help you, is having some kind of "nice daemon" ... this could monitor arbitrary conditions of processes on a box and then renice/iorenice/taskset or even SIGSTOP them for short amounts of time. I have more than a slight suspicion that the only reason noone has written one already is due to the lack of a decent/fact API for reading /proc values. Does that sound interesting?

By cks at 2007-09-14 23:20:05:

Given the discussion of the expense of calculating PSS and USS in the kernel, it doesn't seem like a real (in-kernel) PSS or USS limit would be feasible or supported by the kernel developers.

The problem I see with a daemon is reacting fast enough to be useful in our environment. In a compute server environment like what we're running, I expect that most of the bad processes go bad very fast, not grow slowly over a long time. (They may even start out bad because they are simply being asked to work with a dataset that is too big, so they OOM the machine as they try to start up.)

Unfortunately gathering actual information about this is hard, since the kernel doesn't log enough of the right sort of information.

Written on 13 September 2007.
« Mass scanning via POP3
A thought on untyped languages »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Sep 13 23:18:21 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.