Our problem with (Amanda) backups of many files, especially incrementals
Our fileserver-based filesystems have a varying number of inodes in use on them, ranging from not very many (often on filesystems with not a lot of space used) to over 5.7 million. Generally our Amanda backups have no problems handling the filesystems with not too many inodes used, even when they're quite full, but the filesystems with a lot of inodes used seem to periodically give our backups a certain amount of heartburn. This seems to be especially likely if we're doing incremental backups instead of full ones.
(We have some filesystems with 450 GB of space used in only a few hundred inodes. The filesystems with millions of inodes used tend to have a space used to inodes used ratio from around 60 KB per inode up to 200 KB or so, so they're also generally quite full, but clearly being full by itself doesn't hurt us.)
Our Amanda backups use GNU Tar to actually read the filesystem and generate the backup stream. GNU Tar works through the filesystem and thus the general Unix POSIX filesystem interface, like most backup systems, and thus necessarily has some general challenges when dealing with a lot of files, especially during incremental backups.
When you work through the filesystem, you can only back up files
by opening them and you can only check if a file needs to be included
in an incremental backup by
stat()ing it to get its modification
time and change time. Both of these
activities require the Unix kernel and filesystem to have access
to the file's inode; if you have a filesystem with a lot of inodes,
this will generally mean reading it off the disk. On HDs, this is
reasonably likely to be a seek-limited activity, although fortunately
it clearly requires less than one seek per inode.
Reading files is broadly synchronous
but in practice the kernel will start doing readahead for you almost
stat()s is equally synchronous, and then things
get a bit complicated. Stat() probably doesn't have any real readahead
most of the time (for ZFS there's some hand waving here because
in ZFS inodes are more or less stored in files), but you also get 'over-reading'
where more data than you immediately need is read into the kernel's
cache, so some number of inodes around the one you wanted will be
available in RAM without needing further disk fetches. Still, during
incremental backups of a filesystem with a lot of files where only
a few of them have changed, you're likely to spend a lot of time
stat()ing files that are unchanged, one after another, with only
a few switches to
read()ing files. On full backups, GNU Tar is
probably switching back and forth between
it backs up each file in turn.
(On a pragmatic level it's clear that we have more problems with incrementals than with full backups.)
I suspect that you could speed up this process somewhat by doing
stat()s in parallel (using multiple threads), but I doubt
that GNU Tar is ever going to do that. Traditionally you could also
often get a speedup by sorting things into order by inode number,
but this may or may not work on ZFS (and GNU Tar may already be
doing it). You might also get a benefit by reading in several tiny
files at once in parallel, but for big files you probably might as
well read them one at a time and enjoy the readahead.
I'm hoping that all of this will be much less of a concern and a problem when we move from our current fileservers to our new ones, which have local SSDs and so are going to be much less affected by a seek-heavy worklog (among other performance shifts). However this is an assumption; we might find that there are bottlenecks in surprising places in the whole chain of software and hardware involved here.
(I have been tempted to take a ZFS copy of one of our problem filesystems, put it on a test new fileserver, and see how backing it up goes. But for various reasons I haven't gone through with that yet.)
PS: Now you know why I've recently been so interested in knowing where in a directory hierarchy there were a ton of files (cf).