Our problem with (Amanda) backups of many files, especially incrementals

August 29, 2018

Our fileserver-based filesystems have a varying number of inodes in use on them, ranging from not very many (often on filesystems with not a lot of space used) to over 5.7 million. Generally our Amanda backups have no problems handling the filesystems with not too many inodes used, even when they're quite full, but the filesystems with a lot of inodes used seem to periodically give our backups a certain amount of heartburn. This seems to be especially likely if we're doing incremental backups instead of full ones.

(We have some filesystems with 450 GB of space used in only a few hundred inodes. The filesystems with millions of inodes used tend to have a space used to inodes used ratio from around 60 KB per inode up to 200 KB or so, so they're also generally quite full, but clearly being full by itself doesn't hurt us.)

Our Amanda backups use GNU Tar to actually read the filesystem and generate the backup stream. GNU Tar works through the filesystem and thus the general Unix POSIX filesystem interface, like most backup systems, and thus necessarily has some general challenges when dealing with a lot of files, especially during incremental backups.

When you work through the filesystem, you can only back up files by opening them and you can only check if a file needs to be included in an incremental backup by stat()ing it to get its modification time and change time. Both of these activities require the Unix kernel and filesystem to have access to the file's inode; if you have a filesystem with a lot of inodes, this will generally mean reading it off the disk. On HDs, this is reasonably likely to be a seek-limited activity, although fortunately it clearly requires less than one seek per inode.

Reading files is broadly synchronous but in practice the kernel will start doing readahead for you almost immediately. Doing stat()s is equally synchronous, and then things get a bit complicated. Stat() probably doesn't have any real readahead most of the time (for ZFS there's some hand waving here because in ZFS inodes are more or less stored in files), but you also get 'over-reading' where more data than you immediately need is read into the kernel's cache, so some number of inodes around the one you wanted will be available in RAM without needing further disk fetches. Still, during incremental backups of a filesystem with a lot of files where only a few of them have changed, you're likely to spend a lot of time stat()ing files that are unchanged, one after another, with only a few switches to read()ing files. On full backups, GNU Tar is probably switching back and forth between stat() and read() as it backs up each file in turn.

(On a pragmatic level it's clear that we have more problems with incrementals than with full backups.)

I suspect that you could speed up this process somewhat by doing several stat()s in parallel (using multiple threads), but I doubt that GNU Tar is ever going to do that. Traditionally you could also often get a speedup by sorting things into order by inode number, but this may or may not work on ZFS (and GNU Tar may already be doing it). You might also get a benefit by reading in several tiny files at once in parallel, but for big files you probably might as well read them one at a time and enjoy the readahead.

I'm hoping that all of this will be much less of a concern and a problem when we move from our current fileservers to our new ones, which have local SSDs and so are going to be much less affected by a seek-heavy worklog (among other performance shifts). However this is an assumption; we might find that there are bottlenecks in surprising places in the whole chain of software and hardware involved here.

(I have been tempted to take a ZFS copy of one of our problem filesystems, put it on a test new fileserver, and see how backing it up goes. But for various reasons I haven't gone through with that yet.)

PS: Now you know why I've recently been so interested in knowing where in a directory hierarchy there were a ton of files (cf).

Written on 29 August 2018.
« How I recently used vendoring in Go
Making Ubuntu bug reports seems to be useless (or pointless) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Aug 29 23:07:03 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.