Why user programs mapping page zero is so bad news on x86 hardware

August 18, 2009

The Linux kernel recently had a significant security issue or two where the root cause was that user programs could map memory at page zero, and this lead to kernel level exploits. If you went through the same sort of undergrad OS course that I did, you might be wondering how on earth a user process memory mapping issue leads to a kernel exploit; after all, as all of those little box diagrams tell us, the user program address space is one thing and the kernel address space is an entirely different thing.

That's the nice theoretical view as presented in undergrad OS courses. The messy reality of actual hardware is that on 32-bit x86 machines, accessing a completely separate address space is really expensive (I remember figures of a 10% to 20% overall performance hit, depending on what your programs do). The result is that no common operating system puts its kernel in a completely separate address space on x86 machines; instead pretty much everyone (not just Linux) embeds the kernel in every user process's address space and relies on page protections to keep it inaccessible to user code.

(There actually have been Linux patches that change this, such as the '4G/4G' split.)

When the system switches into kernel mode the kernel's pages become accessible. But this is not a switch between address spaces, it's extra permissions, so the current user process's pages stay visible and accessible, although properly written kernel code doesn't ever directly touch them.

Now we get to the problem. Page zero is where NULL pointers point; if the kernel dereferences a NULL pointer in some way, it will try to access something in page zero or shortly above it. Thus if a user program can map a page at page zero and then persuade the kernel to deference a NULL pointer, this shared and accessible address space means that the kernel is directly getting data from the user program's page without realizing it and the user program is in control of the result of the NULL dereference. In the most dangerous case, the kernel is dereferencing a function pointer that it will go and jump to; as it happens, an x86 CPU is perfectly happy to jump to a user page and run code there while in kernel mode.

(This is instant game over if it happens, since the kernel is now running arbitrary attack code of the program's choice.)

This is not just a Linux problem; this is an issue for pretty much any x86 operating system that can ever be coaxed into dereferencing NULL pointers in kernel mode. Either you need very good, very foolproof protection against NULL pointer dereferences (and one of the Linux bugs recently showed how hard this is), or you need to make absolutely sure that a user program cannot map a page zero, ever.

(For safety you should also forbid low memory close to page zero, in case you ever dereference a NULL pointer with a relatively large offset.)

Comments on this page:

From at 2009-08-18 14:36:42:

4:4 split is more like 4% for most uses. I don't think I ever saw even a microbenchmark topping 7%. We only dropped it because we generally aim not to carry any non-upstream patches.

Written on 18 August 2009.
« More accidental BitTorrent on our network
Python, signal handlers, and EINTR »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Aug 18 00:34:39 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.