Understanding the basic shape of Unix virtual memory management

January 15, 2012

Although their implementations vary in detail, every modern Unix system (ie, everyone with mmap()) has some basic constraints that shape the general outline of their virtual memory system. While I'm talking about virtual memory statistics, it's worth running down this basic shape and what creates it.

(I say this partly because I spend part of writing yesterday's entry sorting bits of this out in my own head.)

The fundamental mmap() operation is to glue some resource (such as a portion of a file) into the process's memory address space. Multiple processes can each map the same resource into their memory space; when they do, all processes need to share the same pages of physical RAM in order to keep everyone's view of the resource in sync (in fact you want this to happen for write() based IO as well). This implies that a process's address space is effectively composited together from a bunch of entities representing different memory areas.

However, this compositing needs to involve a layer of indirection. Processes don't all map a resource at the same address space (for example, a shared library may be mapped at many different addresses), plus processes can overlay private changes on top of shared resources (eg, copy on write for various things). This implies a two step mapping; a process has 'virtual memory areas' holding information used to build its own page table (and to track private versions of pages), and these then point to a shared data structure to keep track of the actual resource, what pages it has in RAM, and so on.

(If we construe 'process' broadly, processes sometimes share VMAs; threads traditionally run in a shared address space, for example.)

In theory, Unix systems could have what I will call coherent page table mappings of resources, where if a single process brought a page of a shared resource into RAM all processes using that resource would get a page table entry for that page. In practice, this would involve a lot of page table changes for processes that don't care about the pages in question and may never refer to it, so I think that basically all Unix processes have incoherent mappings; a page of a resource may be in RAM and be mapped by other processes using it without this process having a PTE for it. When your process tries to refer to that page it will take a soft page fault to establish the PTE it needs and then immediately go on.

(The traditional distinction between soft page faults and hard page faults is that a hard page fault needs to fetch things from disk while a soft page fault just updates things in memory.)

In turn this means that a page of a shared resource can be taken away from your process in one of two ways; call these 'local' and 'global'. A 'local' removal just removes your PTE for the page, but it stays mapped by any other processes that are using it. A 'global' removal makes the page unreferenced by anyone by removing all PTEs in all processes that were (still) using it. Normally you hope to get rid of pages by a series of local removals that result in the page not being mapped by anyone, at which point you can do a free global removal.

This obviously complicates efforts to answer questions about memory usage where it involves shared resources. A shared resource will be using some number of pages of RAM, but not all of those pages will be mapped in any particular process or group of processes. In fact it may not be possible to determine how many active pages a particular resource has because your system may only give you a 'per-process' view of shared resources.

Written on 15 January 2012.
« What do we mean when we talk about something's memory usage?
What you can find out about the memory usage of your Linux programs »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jan 15 01:59:06 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.