The Amanda backup system completely reads tar archives on restores

November 12, 2023

Amanda is the backup system that we use, and have used for years. Strictly speaking, you could say that Amanda is a backup scheduling and storage management system, in that the actual backups are made by existing tools such as tar, placing it on one side of the divide between backup systems about how much they know about what they're storing. In practice this usually doesn't matter; you do backups through Amanda and normally you do restores through Amanda as well, with Amanda automatically running the underlying tools for you with the right arguments.

(Although sometimes you wind up having to look at tar and the other underlying programs because of bugs.)

We have a few giant filesystems that we back up, such as our /var/mail filesystem. Backups of our /var/mail are no problem, especially since it's now on its own server, but recently we tried restoring an inbox from our Amanda backups for the first time in a while and found that it took much longer than we expected. Part of this was that even after the particular inbox we wanted had been extracted by amrecover, amrecover (and the backend Amanda daemons) kept running. What was going on is straightforward; Amanda reads all the way through your (tar) archive on any and all restores, even if you've already extracted what you're looking for. This means that the overall restore process doesn't end (and release the resources it's using) until the full archive is read, which may take a while if your archive is for a 1 TiByte filesystem.

Update: This only happens for some forms of Amanda tar-based backups, such as compressed tar dumps. With at least some uncompressed tar dumps (and storage mediums), Amanda will stop after it's restored your files and may be able to do this quite efficiently.

This isn't an Amanda bug. Instead it's more or less a limitation of Amanda's approach to backups, specifically of delegating the actual backup and restore process to other programs. Since Amanda has to delegate the restore process to tar, it has no idea of when tar is 'done'; all it can do is run tar until tar exits. GNU Tar itself has no feature for 'exit when you've restored everything listed on the command line as a selective restore', and it might sometime be difficult for even tar to know when it was done, depending on what you ask for (and how the tar archive is structured). Plus, if what you want is located toward the end of the tar archive, Amanda and tar have no choice but to read all the way through the archive to it, because neither of them know where anything is located in the archive.

(All of this is part of why some backup tools use their own custom backup formats. A backup system with a custom format can potentially jump to exactly the things you want to restore, pull them out, and know that it's done and can stop work now.)

One of the corollaries for this is that if you want fast recoveries with Amanda, or in general to speed up your recoveries, one of the things you need to look at is how fast you can read through your backup archives and how to speed that up. We've wound up doing some work on that recently (as a result of this slow recovery experience), but that's for another entry.

Written on 12 November 2023.
« Go modules and the domain expiry problem
Amanda has clever restores from tar archives (sometimes) »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Nov 12 21:56:14 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.