What do we mean when we talk about something's memory usage?

January 14, 2012

Robert Haas writes:

It should be possible for a reasonably intelligent human being (in which category I place myself) to answer simple questions about system memory usage, such as "How much memory is my database using?" or "How much memory is my web server using?" relatively simply.

One of the problems is that the question being asked here is not well defined. There are several things that this question could mean (talking only about a single process in order to simplify life):

  1. how much virtual memory the process has asked the operating system for.
  2. how much virtual memory the process has actually looked at; processes often ask for more memory than they ever use.

    (Often this isn't deliberate; for example, a low level memory allocator may ask for extra space because it anticipates future requests that turn out to never get made.)

  3. how much RAM would be freed up if this process didn't exist, which is theoretically the same as how much extra RAM this process uses as it is running.
  4. how much RAM the process would require if it was the only thing running on the system (or at least if it shared nothing with any other process on the system).

  5. what the process's 'fair share' of all of the RAM in use on the system is, where some attempt is made to assign a portion of the cost of RAM that's being shared between several processes to each process.

    (As hinted by the two previous questions, this 'fair share' idea is somewhat artificial; it's quite possible that all of the shared RAM would still be used and needed even if this process didn't exist.)

  6. how much memory would be required if the operating system had to make good on all of its various promises of memory to the process (including for all of the copy-on-write memory that the process could theoretically write to), what we can call committed address space.

    (This is generally unrealistically pessimistic, but not always.)

(Each of these questions is useful and interesting in certain situations.)

What makes most of these questions difficult and complicated is memory that's shared between processes. If there was no memory sharing (or only negligible memory sharing) then several of the questions would collapse together and it would be easy for the operating system to give useful answers to most of them. Unfortunately for Robert Haas, modern Unix systems and modern applications share significant amounts of memory in many circumstances.

(To be fair, properly accounting for shared memory usage has bedeviled Unix from the moment people implemented copy on write for fork().)

There are theoretically straightforward extensions of all of these questions to groups of processes. For things like 'how much RAM would be freed up if they all exited', you have to work out what RAM or virtual memory is shared only between all of the processes versus what RAM is also (partially) shared with outside processes. RAM used only within the group gets entirely charged to the group; RAM also shared outside the group may need to be handled in various ways depending on the specific question you're asking.

(Correctly and usefully grouping processes together is also often not a completely trivial issue. What processes should be considered to be 'your web server' or 'your database server' is often something that's obvious to an experienced human but not necessarily something that's clear to a computer in any useful way. Even when you can come up with an acceptable mechanical definition of a group, groups can easily overlap or be supersets of each other; consider the groups of 'all processes executing this binary' and 'all processes descending from pid <X>'.)

By the way: you may need very low-level access to page table information in order to get correct answers to these questions for groups of processes. If the system provides information on who has what memory areas mapped it's relatively easy to detect entire memory areas that are only shared within a group (eg, all of the database server processes are the only users of a common shared memory segment). But to detect the case where some area of a broadly-shared object is only used by your group of processes, you need detailed per-page information.

(For instance, your database server processes might be the only users of a set of functions and data tables in the base C++ support library, although lots of other processes also have the library mapped.)

Written on 14 January 2012.
« Notes on what Linux's /proc/<pid>/smaps fields mean
Understanding the basic shape of Unix virtual memory management »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jan 14 01:48:06 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.