Reading the POSIX standard for Unix functions is not straightforward

May 18, 2020

I recently wrote about exploring munmap() on page zero, and in the process looked at the POSIX specification for munmap(). One of my discoveries about the practical behavior of Unixes here is that OpenBSD specifically disallows using munmap() on address space that isn't currently mapped (see munmap(2)). In my entry, I said that it wasn't clear whether POSIX strictly authorized this behavior, although you could put forward an interpretation where it was okay.

In a comment, Jakob Kaivo put forward the view that POSIX permitted this and any other behavior for when munmap() was applied to unmapped address space because of a sentence in the end of the Description:

The behavior of this function is unspecified if the mapping was not established by a call to mmap().

At first reading this seems clear. But wait, it's time to get confused. Earlier in the same description of munmap()'s behavior, POSIX clearly says that it can be used if there is no mapping:

[...] If there are no mappings in the specified address range, then munmap() has no effect.

(Note that 'has no effect' is different from 'unspecified'.)

POSIX doesn't require that this not raise an error, but you can read its description of when you can get EINVAL to require that you don't (some of the time). Assuming addr is aligned and len is not zero, you get EINVAL if some of the address space you're unmapping is 'outside the valid range for the address space of a process', and perhaps implicitly not otherwise. And then you have the question of what POSIX intended here by saying 'a process space' instead of 'the (current) process'.

One of the things we can see here is that it's hard for non-specialists to truly read and understand the POSIX standards. Both Jakob Kaivo and I are at least reasonably competent C and Unix programmers and we've both attempted to read a reasonably straightforward POSIX specification of a single function, yet we've wound up somewhere between disagreeing and being uncertain about what it allows.

This is a useful lesson for me to remember any time I'm tempted to appeal to a POSIX standard for how something should work. POSIX standards are written in specifications language, and if they're not completely clear I should be cautious about how correct I am. Probably I should be cautious even if they seem perfectly clear.

(And anyway, the actual behavior of current Unixes matters more than what POSIX says. A POSIX specification is merely a potential lower bound on behavior, especially future behavior. If a Unix does something today and that something is required by POSIX, the odds are good that it will keep doing that in the future.)

PS: My interpretation of the unspecified behavior versus 'no behavior' here is that POSIX is saying that it's unspecified what happens if you munmap() legitimate address space that wasn't obtained through your own mmap(). For instance, if you munmap() part of something that you got with malloc(), anything goes as far as POSIX is concerned. It might work and not produce future problems, it might have no effect, it might kill your program immediately, and it might cause your program to malfunction or blow up in the future.


Comments on this page:

This problem in particular is further complicated by the fact that POSIX definition of Address Space is necessarily vague, since it is highly architecture-dependent.

Is it ever possible to reference address 0 from a process? If so, is it mapped my mmap() , mapped by some other means, or not mapped? Can an address that could potentially be mapped, but isn't currently, part of a process's address space?

All these questions, and more, have to be answered before a system can begin interpreting the standard. There's a whole lot of turtles, even in something seemingly straightforward. Standards are hard. I can't count the number of hours I've spent reading POSIX and still get tripped up.

Written on 18 May 2020.
« Syndication feeds (RSS) and social media can be complementary
Switching to the new in-kernel WireGuard module was easy (on Fedora 31) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon May 18 22:49:57 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.