2018-08-29
Our problem with (Amanda) backups of many files, especially incrementals
Our fileserver-based filesystems have a varying number of inodes in use on them, ranging from not very many (often on filesystems with not a lot of space used) to over 5.7 million. Generally our Amanda backups have no problems handling the filesystems with not too many inodes used, even when they're quite full, but the filesystems with a lot of inodes used seem to periodically give our backups a certain amount of heartburn. This seems to be especially likely if we're doing incremental backups instead of full ones.
(We have some filesystems with 450 GB of space used in only a few hundred inodes. The filesystems with millions of inodes used tend to have a space used to inodes used ratio from around 60 KB per inode up to 200 KB or so, so they're also generally quite full, but clearly being full by itself doesn't hurt us.)
Our Amanda backups use GNU Tar to actually read the filesystem and generate the backup stream. GNU Tar works through the filesystem and thus the general Unix POSIX filesystem interface, like most backup systems, and thus necessarily has some general challenges when dealing with a lot of files, especially during incremental backups.
When you work through the filesystem, you can only back up files
by opening them and you can only check if a file needs to be included
in an incremental backup by stat()
ing it to get its modification
time and change time. Both of these
activities require the Unix kernel and filesystem to have access
to the file's inode; if you have a filesystem with a lot of inodes,
this will generally mean reading it off the disk. On HDs, this is
reasonably likely to be a seek-limited activity, although fortunately
it clearly requires less than one seek per inode.
Reading files is broadly synchronous
but in practice the kernel will start doing readahead for you almost
immediately. Doing stat()
s is equally synchronous, and then things
get a bit complicated. Stat() probably doesn't have any real readahead
most of the time (for ZFS there's some hand waving here because
in ZFS inodes are more or less stored in files), but you also get 'over-reading'
where more data than you immediately need is read into the kernel's
cache, so some number of inodes around the one you wanted will be
available in RAM without needing further disk fetches. Still, during
incremental backups of a filesystem with a lot of files where only
a few of them have changed, you're likely to spend a lot of time
stat()
ing files that are unchanged, one after another, with only
a few switches to read()
ing files. On full backups, GNU Tar is
probably switching back and forth between stat()
and read()
as
it backs up each file in turn.
(On a pragmatic level it's clear that we have more problems with incrementals than with full backups.)
I suspect that you could speed up this process somewhat by doing
several stat()
s in parallel (using multiple threads), but I doubt
that GNU Tar is ever going to do that. Traditionally you could also
often get a speedup by sorting things into order by inode number,
but this may or may not work on ZFS (and GNU Tar may already be
doing it). You might also get a benefit by reading in several tiny
files at once in parallel, but for big files you probably might as
well read them one at a time and enjoy the readahead.
I'm hoping that all of this will be much less of a concern and a problem when we move from our current fileservers to our new ones, which have local SSDs and so are going to be much less affected by a seek-heavy worklog (among other performance shifts). However this is an assumption; we might find that there are bottlenecks in surprising places in the whole chain of software and hardware involved here.
(I have been tempted to take a ZFS copy of one of our problem filesystems, put it on a test new fileserver, and see how backing it up goes. But for various reasons I haven't gone through with that yet.)
PS: Now you know why I've recently been so interested in knowing where in a directory hierarchy there were a ton of files (cf).
How I recently used vendoring in Go
Go 1.11 comes with experimental support for modules, which are more or less Russ Cox's 'vgo' proposal. Initial versions of this proposal were strongly against Go's current feature of vendor directories and wanted to completely replace them. Later versions seem to have toned that down, but my impression is that the Go people still don't like vendoring very much. I've written before about my sysadmin's perspective on vendoring and vgo, where I wanted something that encapsulated all of the build dependencies in a single directory tree that I could treat as a self-contained artifact and that didn't require additional daemons or complicated configuration to use. However, I recently used vendoring for another case, one where I don't think Go's current module support would have worked as well.
For reasons beyond the scope of this entry, I wanted a program that
counted up how many files (well, inodes) were used in a directory
hierarchy, broken down by the sub-hierarchy; this is essentially
the file count version of my standard du
based space usage
breakdown (where I now use this
more convenient version). Since this
basically has to manipulate strings in a big map, writing it in Go
was a natural decision. Go has filepath.Walk
in the standard library,
but for extra speed and efficiency I turned to godirwalk. Everything worked great
right up until I tried to cross-compile my new program for Solaris
(okay, Illumos, but it's the same thing as far as Go is concerned)
so I could run it on our fileservers.
That's when I found out that godirwalk doesn't support Solaris,
ultimately for the reason that Solaris doesn't support file type
information in directory entries.
I was able to hack around this with some effort, but the result is
a private, modified version of godirwalk that's only ever going
to be used by my dircount
program, and then only for as long as
I care about running dircount
on OmniOS (when I stop caring about
that, dircount
can use the official version). I definitely don't
want this to be the apparent official version of godirwalk
in my
$GOPATH/src hierarchy, and this is not really something that Go
modules can solve easily. Traditional Go vendoring solves it neatly
and directly; I just put my hacked up version of godirwalk in
vendor/
, where it will be automatically used by dircount
and
not touched by anything else (well, provided that I build dircount
in my $GOPATH). When or if I don't want to build
with my hacked godirwalk, I can rename the vendor
directory
temporarily and run 'go build
'.
(According to the current documentation, the closest I could come
with modules is to replace
the official godirwalk with my own
version that I would have to set up in some tree somewhere. This
replacement would be permanent until I edited go.mod
; I couldn't
switch back and forth easily.)
This isn't a use I'd initially thought of for vendoring, but in retrospect it's an obvious one. Vendoring makes convenient private copies of packages; normally you use this to freeze package versions, but you can just as well use this to apply and freeze your own modifications. Probably I'll run into other cases of this in the future.
(I will elide a discussion of whether this sort of local change to upstream packages is a good idea or whether you should really rename them into a new package name (and thus Go modules are forcing you to do the right thing).)