Modern Linux kernel memory allocation rules for higher-order page requests
Back in 2012 I wrote an entry on why our Ubuntu 10.04 server had a page allocation failure, despite apparently having a bunch of memory free. The answer boiled down to the the NFS code wanting to allocate a higher-order request of 64 Kb of (physically contiguous) memory and the kernel having some rather complicated and confusing rules for when this was permitted when memory was reasonably fragmented and low(-ish).
That was four and a half years ago, back in the days of kernel 3.5.
Four years is a long time for the kernel. Today the kernel people are working on 4.11 and, unsurprisingly,
things have changed around a bit in this area of code. The function
involved is still called
__zone_watermark_ok() in mm/page_alloc.c,
but it is much simpler today. As far as I can tell from the code, the
new general approach is nicely described by the function's current
Return true if free base pages are above 'mark'. For high-order checks it will return true of the order-0 watermark is reached and there is at least one free page of a suitable size. Checking now avoids taking the zone lock to check in the allocation paths if no pages are free.
The 'order-0' watermark is the overall lowmem watermark (which I
low: from my old entry).
This bounds all requests for obvious reasons; as the code says in
a comment, if a request for a single page is not something that can
go ahead, requests for more than one page certainly can't. Requests
for order-0 pages merely have to pass this watermark; if they do,
they get a page.
Requests for higher-order pages have to pass an obvious additional check, which is that there has to be a chunk of at least the required order that's still free. If you ask for a 64 Kb contiguous chunk, your request can't be satisfied unless there's at least one chunk of size 64 Kb or bigger left, but it's satisfied if there's even a single such chunk. Unlike in the past, as far as I can tell requests for higher-order pages can now consume all of those pages, possibly leaving only fragmented order-0 4 Kb pages free in the zone. There is no longer any attempt to have a (different) low water mark for higher-order allocations.
This change happened in late 2015, in commit 97a16fc82a; as far as I can tell it comes after kernel 4.3 and before kernel 4.4-rc1. I believe it's one commit in a series by Mel Gorman that reworks various aspects of kernel memory management in this area. His commit message has an interesting discussion of the history of high-order watermarks and why they're apparently not necessary any more.
(Certainly I'm happy to have this odd kernel memory allocation failure mode eliminated.)
Sidebar: Which distributions have this change
Ubuntu 16.04 LTS uses kernel '4.4.0' (plus many Ubuntu patches); it has this change, although with some Ubuntu modifications from the stock 4.4.0 code. Ubuntu 14.04 LTS has kernel 3.13.0 so it shouldn't have this change.
CentOS 7 is using a kernel labeled '3.10.0'. Unsurprisingly, it does not have this change and so should have the old behavior, although Red Hat has been known to patch their kernels so much that I can't be completely sure that they haven't done something here.
Debian Stable has kernel 3.16.39, and thus should also be using the old code and the old behavior. Debian Testing ('stretch') has kernel 4.9.13, so it should have this change and so the next Debian stable release will include it.