You can sensibly move or copy Prometheus's database with rsync

July 21, 2022

Recently we upgraded our Prometheus server from Ubuntu 18.04 to 22.04, moved it to new hardware, and migrated it from using a mirrored pair of 4 TB HDDs to a new mirrored pair of 20 TB HDDs. One of our goals in this migration was for the server to be down for as short as possible, because when the server is down we're not collecting metrics (including ones that we alert on, such as the temperature of the department's machine rooms).

When I was initially planning these migrations out, I assumed that I would have to migrate the data to new HDDs separately from the OS and hardware upgrade. The process would be to install the new OS on the new server, take the old and the new servers down to move the 4 TB data disks to the new server, bring it up, then replace the 4 TB HDDs with 20 TB HDDs one by one (resyncing the mirror each time, so two resyncs). But in the end we were able to do the entire migration all at once; we installed the new server with the new 20 TB HDDs, then copied the Prometheus time series database over with rsync in several incremental steps. The final rsync incremental copy (done with Prometheus down to freeze the TSDB) was quite fast, certainly faster than swapping the 4 TB HDDs would have been.

(The old server didn't have removable drive bays, and in any case even different generations of servers from the same vendor often have different drive holders, requiring you to remount the drives as part of the migration.)

Normally, I wouldn't expect an incremental rsync of a database to work very well. Databases are famous for changing things all over their storage files in normal operation, while rsync needs as much things to be unchanged as possible (ideally, for files to not even have been touched, so that it can skip them based purely on modification time and file size). However, the Prometheus TSDB turns out to have an unusual structure on disk that leaves much of its files untouched once they get old enough. When I looked at our TSDB directories, I noticed that many of them seemed to be relatively inactive, and the replies to my Twitter thread were very informative. For example, Frederic Branczyk's:

[TSDB] Compaction compacts to max 31 days or 1/10th of retention depending on what’s lower, so from there on blocks will never change unless you use the delete API. If you can turn off the server and copy it’s definitely safe to do.

This is actually covered in the Prometheus documentation on local storage, which would have saved me some worry if I'd read it first.

(Prometheus snapshots aren't viable for us because they create a copy of your entire TSDB. The purpose of migrating from 4 TB to 20 TB HDDs was because our TSDB was almost filling up the 4 TB mirror. Making a copy is impossible, as well as likely to be very time consuming.)

Although I haven't tested it yet, I believe that one implication of this is that you can make partial backups of the Prometheus TSDB. Backing up our entire Prometheus TSDB is impractical (in the future it would require another pair of 20 TB HDDs), but we can likely back up only the most recent year or two or three (however much fits on the available backup HDD space) by copying files and chunk directories that aren't too old (or in general that are within your desired date range). Unfortunately rsync doesn't have a straightforward option for 'skip all files older than ...', so we'll have to build something by hand.

When doing this, I believe you need to copy an entire chunk directory as a whole, not take bits out of it. In any case, the file modification times of things in sufficiently old chunk directories seems to all be the same and be the end of the time period, presumably because that's when it was all given its final compaction. If you need to know the time range covered by a given directory, it's in the meta.json file as minTime and maxTime, both of which are Unix timestamps in milliseconds.

(Initially we thought we'd be okay with starting metrics data from scratch if something terrible happened to our TSDB. However, we're using our metrics more and more to answer historical usage questions about the past year or two, so it would be at least irritating and maybe painful to lose that data. We have plenty of 2 TB HDDs and even some used 4 TB HDDs, and 4 TB HDDs will hold several years of our data; our current setup has been running on them since late 2018 and only just filled them up, although our disk space usage per day rate has been going up.)

Written on 21 July 2022.
« A brute force solution to nested access permissions in Apache
I've now used Linux nftables for firewall rules and it went okay »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jul 21 22:02:31 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.