Wandering Thoughts archives


What has to happen with Unix virtual memory when you have no swap space

Recently, Artem S. Tashkinov wrote on the Linux kernel mailing list about a Linux problem under memory pressure (via, and threaded here). The specific reproduction instructions involved having low RAM, turning off swap space, and then putting the system under load, and when that happened (emphasis mine):

Once you hit a situation when opening a new tab requires more RAM than is currently available, the system will stall hard. You will barely be able to move the mouse pointer. Your disk LED will be flashing incessantly (I'm not entirely sure why). [...]

I'm afraid I have bad news for the people snickering at Linux here; if you're running without swap space, you can probably get any Unix to behave this way under memory pressure. If you can't on your particular Unix, I'd actually say that your Unix is probably not letting you get full use out of your RAM.

To simplify a bit, we can divide pages of user memory up into anonymous pages and file-backed pages. File-backed pages are what they sound like; they come from some specific file on the filesystem that they can be written out to (if they're dirty) or read back in from. Anonymous pages are not backed by a file, so the only place they can be written out to and read back in from is swap space. Anonymous pages mostly come from dynamic memory allocations and from modifying the program's global variables and data; file backed pages come mostly from mapping files into memory with mmap() and also, crucially, from the code and read-only data of the program.

(A file backed page can turn into an anonymous page under some circumstances.)

Under normal circumstances, when you have swap space and your system is under memory pressure a Unix kernel will balance evicting anonymous pages out to swap space and evicting file-backed pages back to their source file. However, when you have no swap space, the kernel cannot evict anonymous pages any more; they're stuck in RAM because there's nowhere else to put them. All the kernel can do to reclaim memory is to evict whatever file-backed pages there are, even if these pages are going to be needed again very soon and will just have to be read back in from the filesystem. If RAM keeps getting allocated for anonymous pages, there is less and less RAM left to hold whatever collection of file-backed pages your system needs to do anything useful and your system will spend more and more time thrashing around reading file-backed pages back in (with your disk LED blinking all of the time). Since one of the sources of file-backed pages is the executable code of all of your programs (and most of the shared libraries they use), it's quite possible to get into a situation where your programs can barely run without taking a page fault for another page of code.

(This frantic eviction of file-backed pages can happen even if you have anonymous pages that are being used only very infrequently and so would normally be immediately pushed out to swap space. With no swap space, anonymous pages are stuck in RAM no matter how infrequently they're touched; the only anonymous pages that can be discarded are ones that have never been written to and so are guaranteed to be all zero.)

In the old days, this usually was not very much of an issue because system RAM was generally large compared to the size of programs and thus the amount of file-backed pages that were likely to be in memory. That's no longer the case today; modern large programs such as Firefox and its shared libraries can have significant amounts of file-backed code and data pages (in addition to their often large use of dynamically allocated memory, ie anonymous pages).

In theory, this thrashing can happen in any Unix. To prevent it, your Unix has to decide to deliberately not allow you to allocate more anonymous pages after a certain point, even though it could evict file-backed pages to make room for them. Deciding when to cut your anonymous page allocations off is necessarily a heuristic, and so any Unix that tries to do it is sooner or later going to prevent you from using some of your RAM.

(This is different than the usual issue with overcommitting virtual memory address space because you're not asking for more memory than could theoretically be satisfied. The kernel has to guess how much file-backed memory programs will need in order to perform decently, and it has to do so at the time when you try to allocate anonymous memory since it can't take the memory back later.)

unix/NoSwapConsequence written at 22:26:29; Add Comment

Rewriting my iptables rules using ipsets

On Mastodon, I was tempted:

My home and office workstation have complicated networking, but their firewall rules are actually relatively simple. Maybe it's time to switch them over from annoying iptables to the new shiny nftables stuff, which might at least be more readable (and involve less repetition).

Feedback convinced me to not go that far. Instead, today I rewrote my iptables rules in terms of ipsets (with multiple set matches), which eliminated a great deal of their prior annoyance (although not all of it).

My workstation firewall rules did not previously use ipsets because I first wrote them before ipsets were a thing; in fact, they date from the days of ipchains and Linux 2.2. In the pre-ipset world, this meant a separate iptables rule for each combination of source IP, destination port, and protocol that I wanted to block (or allow). On my office workstation, this wound up with over 180 INPUT table rules (most of them generated automatically).

Contrary to what I asserted a few years ago, most of the actual firewall rules being expressed by all of these iptables rules are pretty straightforward. Once I simplified things a bit, there are some ports that only my local machine can access, some ports that only 'friendly' machines can access, and some machines I don't like that should be blocked from a large collection of ports, even ones that are normally generally accessible. This has an obvious translation to ipset based rules, especially if I don't try to be too clever, and the result is a lot fewer rules that are a lot easier to look over. There's still some annoying repetition because I want to match both the TCP and UDP versions of most ports, but I can live with that.

(Enough of the ports that I want to block access to come in both TCP and UDP versions that it's not worth making a finer distinction. That would lead to more ipsets, which is more annoying in practice.)

When I did the rewrite, I did simplify some of the fine distinctions I had previously made between various ports and various machines. I also dropped some things that were obsolete, both in terms of ports that I was blocking and things like preventing unencrypted GRE traffic, since I no longer use IPsec. I could have done this sort of reform without a rewrite, but I had nothing to push me to do it until now and it wouldn't have been as much of a win. The actual rewrite was a pretty quick process and the resulting shell script is what I consider to be straightforward.

(The new rules also have some improvements; for example, I now have some IPv6 blocks on my home machine. Since I already had an ipset of ports, I could say 'block incoming IPv6 traffic from my external interface to these ports' in a single ip6tables rule.)

As far as I'm concerned, so far the three big wins of the rewrite are that 'iptables -nL INPUT' no longer scrolls excessively, I'm no longer dependent on my ancient automation to generate iptables rules, and I've wound up writing a shell script to totally clear out all of my iptables rules and tables (because I kept wanting to re-run my setup script as I changed it). That my ancient automation silently broke for a while (again) and left my office workstation without most of its blocks since late March is one thing that pushed me into making this change now.

(Late March is when I updated to my first Fedora 5.x kernel, and guess what my ancient automation threw up its hands at. If you're curious why, it was (still) looking at the kernel version to decide whether to use ipchains or iptables.)

linux/IptablesRewriteUsingIpset written at 01:16:36; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.