A realization about whether I can contribute to Python development
I semi-recently read Hynek Schlawack's My road to the Python commit bit, which includes an encouraging call to the readers to get involved in Python development. Reading the article briefly left me all fired up to start doing this, but shortly afterwards cold reality came crashing down on me as I realized that despite any enthusiasm I have, I can't really get involved in Python development in any useful way.
The problem is a simple one: I don't use Python 3 now and I'm rather unlikely to use it any time soon. At the same time, the Python developers have made it very clear that Python 2 is a dead end that is not being developed further and that Python 3 is the future; in fact, for Python development, Python 3 is the present. The conclusion is clear: if you want to contribute to Python development in any meaningful way, you need to be using and working with Python 3. Since I'm only working with Python 2, my ability to contribute to Python development is thus minimal.
(The counter argument is that it's still useful to triage bugs for Python 2, because some of them might get fixed in Python 2.7 point releases. The problem with this is that it doesn't change the fact that Python 2 is a dead end and working on dead ends can easily be described as 'demotivational'. This is especially so when the result of triaging a real bug may just be 'sorry, that's not severe enough to be fixed in Python 2'.)
This kind of makes me sad. Regardless of how crazy it would be for me to respond to Hynek's call (I am overcommitted as it is), knowing that I can't really do anything even if I wanted to is a little bit depressing. Partly it's depressing because it once again shows me how the world of Python development is pulling further and further away from the world that I operate in.
(Of course the real solution to this is to start working with Python 3. But that's hard for me for various reasons, including that a lot of the stuff that I work with is still Python 2 only and will probably never change. It's relatively rare that I start a totally green-field Python program.)
Why our server had its page allocation failure
In the previous entry I went through the kernel messages printed when one of our Linux servers had a page allocation failure. Now it's time to explain why our server had that failure despite what looks like plenty of memory. To refresh, the kernel was trying to allocate a 64 Kbyte ('order 4') chunk of contiguous memory in the Normal zone. There was no such chunk in the server's small Normal zone, but there were several such chunks free in the DMA32 zone (and also larger chunks free in DMA32 that could have been split).
First off, we can rule out allocation from the small DMA zone entirely. As far as I can tell, general memory will almost never be allocated from the DMA zone because most of it is effectively reserved for allocations that have to have DMA memory. This is not a big loss since it's only 16 MBytes of RAM. What matters in our case is the state of the DMA32 zone, and in particular two bits of its state:
Node 0 DMA32 free:12560kB min:7076kB low:8844kB high:10612kB [...]
Node 0 DMA32: 2360*4kB 80*8kB 21*16kB 7*32kB 4*64kB 1*128kB 2*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 12560kB
min:' is the minimum low water mark for most non-urgent memory
allocation requests, and the second line reports on how many chunks of
each allocation size (or order) are free.
On first glance it looks like everything should be fine here, because
there is a free 64 Kb chunk and the zone has more free memory than any
of the low water marks (especially
min), but it turns out that the
kernel does something tricky for higher-order allocations (allocations
that aren't for a single 4 Kb page). To simplify and generalize,
the kernel has decided that when checking free memory limits, if
you're asking for a higher-order page the free memory in lower-order
pages shouldn't count towards the amount of memory it considers free
(presumably because such 'free memory' can't satisfy your request). At
the same time it has to reduce the minimum amount of free memory
required to avoid absurd results.
(One little thing is that the kernel check is made based on what the free memory would be after your allocation is made. This probably won't matter for small requests but might matter if you ask for an order 10 allocation of four megabytes.)
So for higher-order allocations only memory available at that order and higher counts, and the starting low water mark is divided by two for every order above 0, ie for an order 4 request like ours the water marks wind up divided by 16 (which conveniently is 2^order). In theory in our situation this means that the kernel would consider there to be 1920 Kb free in DMA32 (well, 1856 Kb after we take off our allocation) instead of 12560 Kb and the minimum low water mark would be 442 Kb instead of 7076 Kb. This still looks like our allocation request should pass muster.
However, the kernel doesn't actually implement the check this way in a single computation. Instead it does it iteratively, using a loop that is done for each order up to (but not including) the order of your request. In pseudo-code:
for each order starting with 0 up to (our order - 1): free memory -= free memory for the current order minimum memory = minimum memory / 2 if free memory <= minimum memory: return failure
The problem is that this iterative approach causes an early failure if a significant amount of the free memory in a zone is in very low order pages, because you can lose a lot of free memory while the current minimum memory requirement only drops by a bit (well, by half). In our situation, much of the free memory in DMA32 is in order 0 pages so the first pass through the loop gives us a new free memory of 3056 Kbytes (12560 Kb minus 9440 Kb of order-0 pages and our 64 Kb request) but a minimum memory requirement of 3538 Kb (the initial 7076 Kb divided by two) and the code immediately declares that there is not enough memory in this zone.
(People who want to read the gory details can see them in
zone_watermark_ok() in mm/page_alloc.c in the Ubuntu 10.04
kernel source, which has been renamed to
current 3.5-rcN kernels.)
I'm reluctant to declare this behavior a bug; the kernel memory people may well consider it working as designed that a zone with a disproportionate amount of its free memory in low-order pages is very reluctant to allocate higher-order chunks, even more reluctant than you might think. However I do think that the current code is at least very unclear as to whether this is intentional (or simply an accident of the current implementation) and what the actual logic is.
(I personally would prefer the direct computation logic. As it stands, you have to know and then explain (and simulate) the actual kernel code in order to understand why this allocation failed; there is no simple to express general rule.)