What I want out of backups for my home machine (in the abstract)

January 13, 2016

Over time I've come to realize that part of my home backup issues is that I have somewhat contradictory desires for my backups at home. Today I feel like writing them down, partly to get everything straight in my own head.

What I want is all of:

  • multiple backups going back in time, not just one backup that is 'the current state'. Being able to go back in time is often handy (and it's reassuring).

    (This can be done just by having several independent disks, each with a separate backup, and in some ways that's the best option. But I don't want to buy that many disks and disk enclosures for my home backups.)

  • that these multiple backups be independent, so that damage to a single backup still leaves me able to recover all the files from another one (assuming the files were present in both).

    (This implies that I don't want deduplication or a full plus incrementals approach, because either mean that a single point of damage affects a whole series of backups.)

  • these backups should be stored in some reasonably compressed form so they use up relatively little space. This is important in order to get multiple independent copies (which imply that my (raw) backups will take much more space than the live data).

  • the backups should normally be offline. Online backups are only one accident or slip-up away from disappearing.

  • making backups should not kill the performance of my machine, because otherwise that's a disincentive to actually make them.

  • the process of creating the current backup should be something that can be done incrementally, so that I can start it, decide that it's having too much of an impact or is taking too long, and stop it again without throwing away the progress to date.

  • backups should be done in a way that captures even awkward, hard to capture things like holes in files, special ACLs, and so on. I consider traditional dump to be ideal for this, although dump is less than ideal if I'm backing up a live filesystem.

If I ignore the last issue and 'backups should be aggressively compressed', it sounds like what I want is rsync to separate directory trees for each backup run. Rsync is about the best technology for being able to interrupt and resume a backup that I can think of, although perhaps modern Linux backup tools can do it too (I haven't looked at them). I can get some compression by choosing eg ZFS as the filesystem on the backup target (that would also get me integrity checks, which I'd like).

If I ignore being able to interrupt and resume backups, dump doing level 0 full backups to files that are compressed with the standalone compressor of my choice is not a bad choice (and it's my default one today). I think it has somewhat more load impact than other options, though.

The actual performance impact of making backups depends partly on the backup method and partly on how I connect the backup target to my machine (there are (bad) options that have outsized impacts). And in general Linux just doesn't seem to do very well here for me, although perhaps rsync could be made to deliberately run slowly to lower its impact.

(For my home machine, backing up to external disk is probably the only solution that I'm happy with. Over the Internet backups have various issues, including my upstream bandwidth.)

Written on 13 January 2016.
« Your system's performance is generally built up in layers
Things I learned from OpenSSH about reading very sensitive files »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jan 13 23:23:52 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.