Wandering Thoughts archives


Our problem with (Amanda) backups of many files, especially incrementals

Our fileserver-based filesystems have a varying number of inodes in use on them, ranging from not very many (often on filesystems with not a lot of space used) to over 5.7 million. Generally our Amanda backups have no problems handling the filesystems with not too many inodes used, even when they're quite full, but the filesystems with a lot of inodes used seem to periodically give our backups a certain amount of heartburn. This seems to be especially likely if we're doing incremental backups instead of full ones.

(We have some filesystems with 450 GB of space used in only a few hundred inodes. The filesystems with millions of inodes used tend to have a space used to inodes used ratio from around 60 KB per inode up to 200 KB or so, so they're also generally quite full, but clearly being full by itself doesn't hurt us.)

Our Amanda backups use GNU Tar to actually read the filesystem and generate the backup stream. GNU Tar works through the filesystem and thus the general Unix POSIX filesystem interface, like most backup systems, and thus necessarily has some general challenges when dealing with a lot of files, especially during incremental backups.

When you work through the filesystem, you can only back up files by opening them and you can only check if a file needs to be included in an incremental backup by stat()ing it to get its modification time and change time. Both of these activities require the Unix kernel and filesystem to have access to the file's inode; if you have a filesystem with a lot of inodes, this will generally mean reading it off the disk. On HDs, this is reasonably likely to be a seek-limited activity, although fortunately it clearly requires less than one seek per inode.

Reading files is broadly synchronous but in practice the kernel will start doing readahead for you almost immediately. Doing stat()s is equally synchronous, and then things get a bit complicated. Stat() probably doesn't have any real readahead most of the time (for ZFS there's some hand waving here because in ZFS inodes are more or less stored in files), but you also get 'over-reading' where more data than you immediately need is read into the kernel's cache, so some number of inodes around the one you wanted will be available in RAM without needing further disk fetches. Still, during incremental backups of a filesystem with a lot of files where only a few of them have changed, you're likely to spend a lot of time stat()ing files that are unchanged, one after another, with only a few switches to read()ing files. On full backups, GNU Tar is probably switching back and forth between stat() and read() as it backs up each file in turn.

(On a pragmatic level it's clear that we have more problems with incrementals than with full backups.)

I suspect that you could speed up this process somewhat by doing several stat()s in parallel (using multiple threads), but I doubt that GNU Tar is ever going to do that. Traditionally you could also often get a speedup by sorting things into order by inode number, but this may or may not work on ZFS (and GNU Tar may already be doing it). You might also get a benefit by reading in several tiny files at once in parallel, but for big files you probably might as well read them one at a time and enjoy the readahead.

I'm hoping that all of this will be much less of a concern and a problem when we move from our current fileservers to our new ones, which have local SSDs and so are going to be much less affected by a seek-heavy worklog (among other performance shifts). However this is an assumption; we might find that there are bottlenecks in surprising places in the whole chain of software and hardware involved here.

(I have been tempted to take a ZFS copy of one of our problem filesystems, put it on a test new fileserver, and see how backing it up goes. But for various reasons I haven't gone through with that yet.)

PS: Now you know why I've recently been so interested in knowing where in a directory hierarchy there were a ton of files (cf).

sysadmin/ManyFilesBackupProblem written at 23:07:03; Add Comment

How I recently used vendoring in Go

Go 1.11 comes with experimental support for modules, which are more or less Russ Cox's 'vgo' proposal. Initial versions of this proposal were strongly against Go's current feature of vendor directories and wanted to completely replace them. Later versions seem to have toned that down, but my impression is that the Go people still don't like vendoring very much. I've written before about my sysadmin's perspective on vendoring and vgo, where I wanted something that encapsulated all of the build dependencies in a single directory tree that I could treat as a self-contained artifact and that didn't require additional daemons or complicated configuration to use. However, I recently used vendoring for another case, one where I don't think Go's current module support would have worked as well.

For reasons beyond the scope of this entry, I wanted a program that counted up how many files (well, inodes) were used in a directory hierarchy, broken down by the sub-hierarchy; this is essentially the file count version of my standard du based space usage breakdown (where I now use this more convenient version). Since this basically has to manipulate strings in a big map, writing it in Go was a natural decision. Go has filepath.Walk in the standard library, but for extra speed and efficiency I turned to godirwalk. Everything worked great right up until I tried to cross-compile my new program for Solaris (okay, Illumos, but it's the same thing as far as Go is concerned) so I could run it on our fileservers. That's when I found out that godirwalk doesn't support Solaris, ultimately for the reason that Solaris doesn't support file type information in directory entries.

I was able to hack around this with some effort, but the result is a private, modified version of godirwalk that's only ever going to be used by my dircount program, and then only for as long as I care about running dircount on OmniOS (when I stop caring about that, dircount can use the official version). I definitely don't want this to be the apparent official version of godirwalk in my $GOPATH/src hierarchy, and this is not really something that Go modules can solve easily. Traditional Go vendoring solves it neatly and directly; I just put my hacked up version of godirwalk in vendor/, where it will be automatically used by dircount and not touched by anything else (well, provided that I build dircount in my $GOPATH). When or if I don't want to build with my hacked godirwalk, I can rename the vendor directory temporarily and run 'go build'.

(According to the current documentation, the closest I could come with modules is to replace the official godirwalk with my own version that I would have to set up in some tree somewhere. This replacement would be permanent until I edited go.mod; I couldn't switch back and forth easily.)

This isn't a use I'd initially thought of for vendoring, but in retrospect it's an obvious one. Vendoring makes convenient private copies of packages; normally you use this to freeze package versions, but you can just as well use this to apply and freeze your own modifications. Probably I'll run into other cases of this in the future.

(I will elide a discussion of whether this sort of local change to upstream packages is a good idea or whether you should really rename them into a new package name (and thus Go modules are forcing you to do the right thing).)

programming/GoVendoringUsage written at 00:16:31; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.