I don't know how much memory our Prometheus setup needs

May 22, 2021

In a comment on my entry on the size of our Prometheus setup, Jr asked a quite relevant question:

Hi Chris. How much memory is this setup consuming?

That's a good question but I don't have any good answers, even though I have lots of metrics. Probably the best indication of how much memory any given Prometheus related daemon needs is the process_resident_memory_bytes metric that's automatically exported from any Go program that uses the standard Prometheus client library.

By that measure, our Prometheus daemon typically operates at between 3 GB and 5 GB of resident RAM, with occasional spikes up to most of the memory on the machine (32 GB currently). Pushgateway has around 100 MBytes of resident set size, Grafana has around 80 MBytes, and the highest things like the host agent get is around 60 MBytes. Since we generate plenty of metrics from scripts and programs run from cron, that under-estimates how much memory (and CPU) we use generating host-related metrics, but the difference is probably not drastic.

(We still have some servers with as little as 2 GB of RAM, and the host agent runs fine on them along with the other things they do.)

Another measure of how much memory is that we started out with our Prometheus host having only 16 GB of RAM until we ran into operational problems with big PromQL queries. We haven't had similar problems since raising it to 32 GB, but then I haven't been trying as many relatively crazy queries. I think Prometheus has also improved its memory use since then so that it doesn't need as much RAM, but I haven't checked. Based on what I've experienced I wouldn't want to run a Prometheus that I expected to make random and possibly large PromQL queries against on a machine with less than 32 GB of RAM. Maybe it would work, maybe not, but I don't see any reason to find out the hard way.

(I think you could run Prometheus with less RAM if you had more control over the queries. Our Prometheus resident RAM spikes are pretty infrequent and clearly don't come from any normal usage, including looking at our dashboards over normal time ranges. Our Prometheus server host doesn't do anything other than run Prometheus, Alertmanager, and so on, so there's nothing else competing for RAM.)

However, I'm not convinced that either of these measures provide a good view of actual memory use. Go programs also export information about Go level memory usage, including go_memstats_alloc_bytes and go_memstats_sys_bytes (which has a complicated definition). The good news about RSS measurements for us is that they don't seem to be too unrelated to Go's allocated bytes amounts. The RSS is higher, especially for Prometheus, but it's relatively consistent.

Overall I wouldn't want to run our Prometheus server host with less than 32 GB, but I feel happy putting the host agent on machines with pretty much any amount of RAM that we consider viable today. Our next iteration of the Prometheus server host will probably have 64 GB of RAM, just because it provides us more room for occasional giant queries (and more disk cache outside of them).

Written on 22 May 2021.
« The temptation to start using some Python type hints
Our three generations of network implementations (over the time I've been here) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat May 22 00:22:52 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.