Exploring munmap() on page zero and on unmapped address space

May 14, 2020

Over in the Fediverse, I ran across an interesting question on munmap():

what does `munmap` on Linux do when address is set to 0? Somehow this succeeds on Linux but fails on FreeBSD. I'm assuming the semantics are different but cannot find any reference regarding to such behavior.

(There's also this additional note, and the short version of the answer is here.)

When I saw this, I was actually surprised that munmap() on Linux succeeded, because I expected it to fail on any address range that wasn't currently mapped in your process and page zero is definitely not mapped on Linux (or anywhere sane). So let's go to the SUS specification for munmap(), where we can read in part:

The munmap() function shall fail if:

[EINVAL]
Addresses in the range [addr,addr+len) are outside the valid range for the address space of a process.

(Similar wording appears in the FreeBSD munmap() manpage.)

When I first read this wording, I assumed that this meant the current address range of the process. This is incorrect in practice on Linux and FreeBSD, and I think in theory as well (since POSIX/SUS talks about 'of a process', not 'of this process'). On both of those Unixes, you can munmap() at least some unused address space, as we can demonstrate with a little test program that mmap()s something, munmap()s it, and then munmap()s it again.

The difference between Linux and FreeBSD is in what they consider to be 'outside the valid range for the address space of a process'. FreeBSD evidently considers page zero (and probably low memory in general) to always be outside this range, and thus munmap() fails. Linux does not; while it doesn't normally let you mmap() memory in that area, for good reasons, it is not intrinsically outside the address space. If I'm reading the Linux kernel code correctly, no low address range is ever considered invalid, only address ranges that cross above the top of user space.

(I took a brief look at the relevant FreeBSD code in vm_mmap.c, and I think that it rejects any munmap() that extends below or above the range of address space that the process currently has mapped. This is actually more restrictive than I expected.)

In ultimately unsurprising news, OpenBSD takes a somewhat different interpretation, one that's more in line with how I expected munmap() to behave. The OpenBSD munmap() manpage says:

[EINVAL]
The addr and len parameters specify a region that would extend beyond the end of the address space, or some part of the region being unmapped is not part of the currently valid address space.

OpenBSD requires you to only munmap() things that are actually mapped and disallows trying to unmap random sections of your potential address space, even if it falls within the bottom and top of your address space usage (where FreeBSD would allow it). Whether this is completely POSIX compliant is an interesting but irrelevant question, since I doubt the OpenBSD people would change this (and I don't think they should).

One of the interesting things I've learned from looking into this is that Linux, FreeBSD, and OpenBSD each sort of have a different interpretation of what POSIX permits (assuming I'm understanding the FreeBSD kernel code correctly). The Linux interpretation is most clearly permitted, since it allows munmap() on anything that might potentially be mappable under some circumstances. OpenBSD, if it cares, would likely say that the 'valid range for the address space of a process' is what it currently has mapped and so their behavior is POSIX/SUS compliant, but this is clearly pushing the interpretation in an unusual direction from a narrow specification style reading of the wording (although it is the behavior I expected). FreeBSD sort of splits the difference, possibly for implementation reasons.

PS: The Linux munmap() manpage doesn't even talk about 'the valid address space of a (or the) process' as a reason for munmap() to fail; it only talks abstractly about the kernel not liking addr or len.

Sidebar: The little test program

Here's the test program I used.

#include <sys/mman.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>

#define MAPLEN  (128*1024)

int main(int argc, char **argv)
{
  void *mp;

  puts("Starting mmap and double munmap test.");
  mp = mmap(0, MAPLEN, PROT_READ, MAP_ANON|MAP_SHARED, -1, 0);
  if (mp == MAP_FAILED) {
    printf("mmap error: %s\n", strerror(errno));
    return 1;
  }
  if (munmap(mp, MAPLEN) < 0) {
    printf("munmap error on first unmap: %s\n", strerror(errno));
    return 1;
  }
  if (munmap(mp, MAPLEN) < 0) {
    printf("munmap error on second unmap: %s\n", strerror(errno));
    return 1;
  }
  puts("All calls succeeded without errors, can munmap() unmapped areas.");
  return 0;
}

I think that it's theoretically possible for something like this program to fail on FreeBSD, if our mmap() established a new top or bottom of the process's address space. In practice it's likely that we will mmap() into a hole between the bottom of the address space (with the program text) and the top of the address space (probably with the stack).


Comments on this page:

All three are POSIX compliant. You missed the final line of the Description

The behavior of this function is unspecified if the mapping was not established by a call to mmap() .

This unspecified behavior allows:

  • munmap() failing if addr did not come from mmap()
  • munmap() failing if addr is not a currently mapped address, regardless of source
  • munmap() succeeding if addr is not a currently mapped address (no-op)
  • possibly other valid interpretations
By cks at 2020-05-16 19:04:49:

I think that the POSIX specification is more tangled than it looks. The description clearly anticipates munmap() being used on unmapped regions, because it specifically says in the first paragraph of the description:

If there are no mappings in the specified address range, then munmap() has no effect.

As I read it, the unspecified behavior is if there is an actual mapping already established in the section of address space you're unmapping, not if the address space is entirely empty. Unmapping page zero is generally a case where there is no existing mapping and so this rule for unspecified behavior would not apply.

(You would get unspecified behavior if you tried to unmap, say, the memory used by a C array defined in the program, or memory that you obtained through malloc().)

By Konstantin Belousov at 2020-05-17 09:41:44:

On FreeBSD, each address space has min and max valid addresses. Any of mmap(2) family functions are only allowed to operate on ranges that belong to [min, max) range.

By default on amd64 min = 4k, max = 0x0000800000000000. So munmap(0, 4096) fails because it does not fit into the range. The lowest page was excluded from the default user range because on machines where kernel is co-located in the same address space as user part, i.e. most modern arches, NULL dereference in kernel accesses something that malicious userspace mmaped at 0. Recent Intel CPUs have hardware mitigations against that sort of issues, SMEP and SMAP.

There is sysctl security.bsd.map_at_zero which lowers min from 4k to 0, it is needed to be able to run very old a.out binaries for which FreeBSD still provides binary compatibility.

Written on 14 May 2020.
« Getting my head around what things aren't comparable in Go
Why we use city names when configuring system timezones »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu May 14 23:46:42 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.