How much swap space we're using across our servers (in October 2022)

October 15, 2022

One of the reactions I've seen to us moving away from swap partitions (to swap files) on our Linux servers is that some people run without any swap space at all. Through a chain of thought I wound up wondering how much swap space we're actually using (as opposed to how much we have configured). Fortunately, our metrics system makes it relatively straightforward to answer questions like that, or at least to come up with some numbers.

Because our servers have different amounts of swap configured, I'm going to look at both how much swap space has been left free and how much swap space has been used. The simpler number is the amount of remaining (free) swap space. Over the past 30 days, three of our compute servers used all of their swap space, our primary login server ran down to only 13.5 MBytes free, and our test virtualization server got as low as 214 Mbytes free. Everything else always had at least 512 Mbytes free. A potentially more interesting number is the average amount of free swap space over the last 30 days, which will factor out short term spikes in swap space usage. Here, nothing had less than 550 Mbytes of swap free, even the compute servers. Looking at the standard deviation of free swap over time suggests that many of our servers don't vary much in their swap usage.

(Our servers only rarely get rebooted, so these numbers are pretty much the long term state for many of these machines.)

Looking at the maximum amount of swap space used over the past 30 days, only four servers used 1 GB or more of swap space. Only eight used more than 512 MB, and only eleven used 128 MB or more. Since we typically configure at least 1 GB of swap space, this suggests that we're seriously overconfiguring most machines. In terms of average swap space used, only one server averaged over 512 MB and only five averaged over 128 MB (and one of those was a compute server).

These aren't really the results that I expected when I started looking. Before I looked, I would have expected more swap space usage from highly unused pages of data being slowly aged out in favour of more productive uses of RAM, such as cached files. But apparently modern Linux kernels mostly don't do this.

(Despite what I've found out here, we probably won't reduce our default swap size below 1 GB. 1 GB of disk space isn't much these days and it's a cheap insurance policy.)

PS: The servers in question are mostly Ubuntu 22.04, with a moderate number of 20.04 and 18.04 machines and a few CentOS 7 ones.

Comments on this page:

By Arnaud Gomes at 2022-10-16 05:41:38:

How does this correlate with the amount of physical RAM?

These days we are mostly in the "no configured swap space" camp. Most of our servers have at least 128 GB RAM, with most of the newest in the half-terabyte range, so a few GB more would make very little practical difference.

We used to run our older servers with swap, sometimes with swap spaces in the tens of gigabytes range; the performance penalty was so big that in many cases it turned out to be better to disable swap altogether and let the OOM killer trigger sooner rather than later.

Of course this depends a lot on your workload.

   -- A
By orev at 2022-10-16 13:58:51:

As these swap posts have been making some rounds on Reddit and Hacker News, I’m shocked at how much the commenters overwhelmingly don’t seem to understand what swap space is used for (including the currently one other comment on here). The assumption being made is that swap sits there doing nothing, until some program tries to use more RAM than exists, and the overflow gets dumped to swap. While that is part of why swap exists, it’s not the only reason.

Why you need swap can best be described as opportunity cost. You have this very fast and expensive resource (RAM), and when you run a program (let’s assume a statically linked one) that loads all its code and some data into RAM. Inevitably, there will be some parts of the code and data that aren’t used frequently or maybe not used at all.

With swap, the OS can identify those inactive RAM pages and move them to disk so the RAM can be used for something else (like disk cache). Without swap, the OS can do nothing about it and you end up with all this inactive junk sitting there.

It seems like the movement to “no swap” has just as many people mindlessly repeating the same dogma as when it was previously said that one had to use a fixed formula of always using 2x RAM for swap. As with most other aspects of computing, you really need to know what you’re doing before making blanket statements about any particular thing.

By cks at 2022-10-17 09:29:40:

There isn't a strong correlation between the maximum amount of swap used and the amount of memory in a machine. Many of our compute servers have 128 GB, but some have more and the GPU based compute servers have less (generally 32 GB); meanwhile the primary login server has 512 GB. The four machines using 1 GB or more of swap in the past 30 days were two of the 128 GB compute servers, the primary login server, and a GPU compute servers. Meanwhile, machines with 8 GB and 16 GB of RAM used less or basically no swap space.

Because we try to put RAM in machines that need it and I'd expect a correlation between using a lot of RAM and using swap space, I'd expect that we'd see more swap space usage on machines with high RAM as a general rule. But I'm not sure that's actually happening, at least in the past 30 days (and there's also issues like SLURM resource limits, which try to clamp memory usage of submitted jobs so they don't exceed the machine's RAM).

Written on 15 October 2022.
« Two views of CPU utilization (a realization)
What it means to see a 'bad' certificate in TLS Certificate Transparency logs »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Oct 15 21:41:24 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.