The kernel memory addressing problem

July 25, 2012

One of the engineering issues in writing an operating system kernel is how your kernel gets access to physical memory. This requires some explanation, since on the surface you might think that this is easy; after all, the kernel runs with full access to the machine so how could it have problems accessing memory?

The simple answer is that today's kernels almost always run with virtual memory, not just for user processes but for themselves as well. Although they run with full privileges, kernels still have a virtual address space and (kernel) page tables that map the virtual addresses that the kernel uses into real physical addresses (both for RAM and for memory-mapped devices). This may be required by the CPU (once you turn on virtual memory it may always use page tables) and even if it's not, it's almost always more convenient (for example, the kernel code doesn't have to be relocated depending on what physical address it was loaded at). Once the kernel is using virtual addresses, you get the question of how to map physical memory to (or into) the kernel's virtual address space.

I am not going to try to provide a complete inventory of the different techniques that have historically been used, but there are two general extremes. The easiest situation is if your kernel address space is large enough to include all of the physical memory and address space of the machine with room left over. This allows you to simply establish a direct linear mapping for all of physical memory and often it will let you use huge pages (page table entries that map large amounts of contiguous physical memory in a single entry).

(You need extra room because you need some amount of virtual address space for things like the kernel code and data itself, since you want these to be at a constant spot.)

The polar opposite of this is to explicitly map chunks of physical memory into the kernel address space as you need them and then unmap them afterwards. This generally creates a kernel interface that looks something like mmap(), because this is basically what you're doing. The obvious drawback of this approach is that kernel code has to explicitly manage these mappings, especially removing them when it doesn't need them any more (otherwise you 'leak' kernel address space). However, if you don't have enough (kernel) address space you don't really have any choice.

There are a number of things that make an explicit mapping approach less painful:

  • when the kernel is getting (and releasing) memory for its own use, you generally need a memory allocator anyways. Such an allocator is a natural central place to establish and release mappings, hiding this entire issue from all of its callers.

  • device drivers, which need to map physical memory in order to get access to memory-mapped devices, are generally long-lived; they can often establish the mapping when they're loaded or the device started and then hold it until the device is closed down.

  • if what the kernel is really doing is accessing the memory of a process you need to take special steps anyways (to map from the process's virtual address space to physical memory, to insure that the access is legal, and perhaps to page things back in or allocate memory). Managing a kernel mapping for the eventual page of physical RAM is in many ways the least of the work involved.

A common element in all of these cases is that the kernel often wants to do additional bookkeeping and checking while it's setting up these mappings. For example, you might want to prevent two device drivers from claiming that they both own the same chunk of physical memory.

(Some people would even argue that directly mapping all of physical memory by default is an invitation for kernel programmers to write sloppy code that skips these sort of necessary steps and thus bypasses important safety checks. This is probably especially so for device drivers, which stereotypically are often written by people who are not expert kernel programmers.)

PS: I suspect that there have been CPUs with instructions that let you explicitly use physical addresses and bypass virtual address translation. I don't know if any current CPUs work that way; it seems at least a little bit at odds with current CPU design trends.

Written on 25 July 2012.
« My dislike for what I call 'perverse Test Driven Development'
What 32-bit x86 Linux's odd 896 MB kernel memory boundary is about »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jul 25 23:21:15 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.