How the Linux kernel divides up your RAM

June 15, 2012

The Linux kernel doesn't consider all of your physical RAM to be one great big undifferentiated pool of memory. Instead, it divides it up into a number of different memory regions (at least for kernel purposes), which it calls 'zones' (to simplify slightly). What memory regions there are depends on whether your machine is 32-bit or 64-bit and also how complicated it is.

The zones are:

  • DMA is the low 16 MBytes of memory. At this point it exists for historical reasons; once upon what is now a long time ago, there was hardware that could only do DMA into this area of physical memory.

  • DMA32 exists only in 64-bit Linux; it is the low 4 GBytes of memory, more or less. It exists because the transition to large memory 64-bit machines has created a class of hardware that can only do DMA to the low 4 GBytes of memory.

    (This is where people mutter about everything old being new again.)

  • Normal is different on 32-bit and 64-bit machines. On 64-bit machines, it is all RAM from 4GB or so on upwards. On 32-bit machines it is all RAM from 16 MB to 896 MB for complex and somewhat historical reasons.

    Note that this implies that machines with a 64-bit kernel can have very small amounts of Normal memory unless they have significantly more than 4GB of RAM. For example, a 2 GB machine running a 64-bit kernel will have no Normal memory at all while a 4 GB machine will have only a tiny amount of it.

  • HighMem exists only on 32-bit Linux; it is all RAM above 896 MB, including RAM above 4 GB on sufficiently large machines.

Normally allocations can come from a more restrictive zone than you asked for if that's where the free memory is. For example, if you ask for Normal memory on a 64-bit machine and there isn't any but there's lots of DMA32, you'll get DMA32. It's just that kernel prefers to preserve DMA32 for things that have asked for it specifically.

Now, this is a slight simplification; actually, zones and memory are attached to a 'node'. Ordinary machines have only a single node, node 0, but sufficiently large servers can have multiple nodes (we have one with eight). Nodes are how Linux represents NUMA architectures. To simplify, each CPU is also associated with a node and the kernel will try to allocate memory for a process running on a CPU from that node's RAM, because it is considered 'closest' to that CPU.

I believe that the special zones (DMA, DMA32, and on 32-bit machines, Normal) will only be present on one node, generally node 0. All other nodes will generally have only Normal (on 64-bit kernels) or HighMem (on 32-bit kernels) memory.

You can see a bunch of information about your system's nodes, zones, and the state of their memory in /proc/pagetypeinfo, /proc/zoneinfo, /proc/<pid>/numa_maps, and /proc/buddyinfo, which deserves an explanation of its own.

The kernel's basic unit of allocatable memory is the 4 KByte page (many stats are reported by page count, instead of memory size in Kbytes). The kernel also keeps track of larger contiguous blocks of pages because sometimes kernel code wants, say, a contiguous 64 kbyte block of memory. /proc/buddyinfo shows you how many such free chunks there are for each allocation 'order'. The 'order' is 2^order pages, ie order 0 is a single page, order 1 is 2 pages (8 KB), order 2 is 4 pages (16 Kb), and so on.

So when /proc/buddyinfo reports, for example:

Node 0, zone DMA32 7 20 2 4 6 4 3 4 6 5 369

This means that in the DMA32 zone on this machine there are currently 7 free solo 4kb pages, 20 8kb two-page chunks, 2 16kb chunks, and so on, all the way up to 369 1024-page (4 Mbyte) chunks. Since the kernel will split larger chunks to get smaller ones if it needs to, the DMA32 zone on this machine is in pretty good shape despite seeming to not have many order 0 4kb pages available.

(This is also what /proc/pagetypeinfo means by 'order' in its output.)

In fact, having a disproportionate number of order 0 pages free is generally a danger sign since order 0 pages exist only when the kernel can't merge them together to form higher-order free chunks. Lots of order 0 pages thus mean lots of fragmentation, where the kernel can't even find two adjacent aligned pages to merge into an 8kb order 1 chunk.

(See also the official documentation for various things in /proc.)

Written on 15 June 2012.
« Some tricky bits in in-browser form mangling and validation
Decoding the Linux kernel's page allocation failure messages »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jun 15 00:08:17 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.