I've had bad luck with transparent hugepages on my Linux machines

January 31, 2023

Normally, pages of virtual memory are a relatively small size, such as 4 Kbytes. Hugepages (also) are a CPU and Linux kernel feature which allows programs to selectively have much larger pages, which generally improves their performance. Transparent hugepage support is an additional Linux kernel feature where programs can be more or less transparently set up with hugepages if it looks like this will be useful for them. This sounds good but generally I haven't had the best of luck with them:

It appears to have been '0' days since Linux kernel (transparent) hugepages have dragged one of my systems into the mud for mysterious reasons. Is my memory too fragmented? Who knows, all I can really do is turn hugepages off.

(Yes they have some performance benefit when they work, but they're having a major performance issue now.)

This time around, the symptom was that Go's self-tests were timing out while I was trying to build it (or in some runs, the build itself would stall). While this was going on, top said that the 'khugepaged' kernel daemon process was constantly running (on a single CPU).

(I'm fairly sure I've seen this sort of 'khugepaged at 100% and things stalling' behavior before, partly because when I saw top I immediately assumed THP were the problem, but I can't remember details.)

One of the issues that can cause problems with hugepages is that to have huge pages, you need huge areas of contiguous RAM. These aren't always available, and not having them is one of the reasons for kernel page allocation failures. To get these areas of contiguous RAM, the modern Linux kernel uses (potentially) proactive compaction, which is normally visible as the 'kcompactd0' kernel daemon. Once you have aligned contiguous RAM that's suitable for use as huge pages, the kernel needs to turn runs of ordinary sized pages into hugepages. This is the job of khugepaged; to quote:

Unless THP is completely disabled, there is [a] khugepaged daemon that scans memory and collapses sequences of basic pages into huge pages.

In the normal default kernel settings, this only happens for processes that use the madvise(2) system call to tell the kernel that a mmap()'d area of theirs is suitable for this. Go can do this under some circumstances, although I'm not sure what they are exactly (the direct code that does it is deep inside the Go runtime).

If you look over the Internet, there are plenty of reports of khugepaged using all of a CPU, often with responsiveness problems to go along with it. Sometimes this stops if people quit and restart some application; at other times, people resort to disabling transparent hugepages or rebooting their systems. No one seems to have identified a cause, or figured out what's going on to cause the khugepaged CPU usage or system slowness (presumably the two are related, perhaps through lock contention or memory thrashing).

Disabling THP is done through sysfs:

echo never >/sys/kernel/mm/transparent_hugepage/enabled

The next time around I may try to limit THP's 'defragmentation' efforts:

echo never >/sys/kernel/mm/transparent_hugepage/defrag

(The normal settings for both of these these days are 'madvise'.)

If I'm understanding the documentation correctly, this will only use a hugepage if one is available at the time that the program calls madvise(); it won't try to get one later and swap it in.

(Looking at the documentation makes me wonder if Go and khugepaged were both fighting back and forth trying to obtain hugepages when Go made a madvise() call to enable hugepages for some regions.)

I believe I've only really noticed this behavior on my desktops, which are unusual in that I use ZFS on Linux on them. ZFS has its own memory handling (the 'ARC'), and historically has had some odd and uncomfortable interaction with the normal Linux kernel memory system. Still, it doesn't seem to be just me who has khugepaged problems.

(I don't think we've seen these issues on our ZFS fileservers, but then we don't run anything else on the fileservers. They sit there handling NFS in the kernel and that's about it. Well, there is one exception these days in our IMAP server, but I'm not sure it runs anything that uses madvise() to try to use hugepages.)

Comments on this page:

In the normal default kernel settings, this only happens for processes that use the madvise(2) system call to tell the kernel that a mmap()'d area of theirs is suitable for this.

This conflicts with madvise(2):

Most common kernels configurations provide MADV_HUGEPAGE- style behavior by default, and thus MADV_HUGEPAGE is normally not necessary. It is mostly intended for embedded systems, where MADV_HUGEPAGE-style behavior may not be enabled by default in the kernel.

FWIW, at least RHEL 6, 7 and 8 do enable transparent hugepages for all programs, i.e. systemwide such that /sys/kernel/mm/transparent_hugepage/enabled reads 'always' there, by default.

Whereas on Fedora it's opt-in, i.e. there /sys/kernel/mm/transparent_hugepage/enabled is set to 'madvise', by default. However, activating the `hpc-compute` tuned profile there changes it to 'always', as well.

According to internet searches the Ubuntu situation is: YMMV

See also: https://unix.stackexchange.com/questions/495816/which-distributions-enable-transparent-huge-pages-for-all-applications

Written on 31 January 2023.
« One reason I still prefer BIOS MBR booting over UEFI
C was not created as an abstract machine (of course) »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jan 31 23:04:27 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.