What I want out of backups for my home machine (in the abstract)

January 13, 2016

Over time I've come to realize that part of my home backup issues is that I have somewhat contradictory desires for my backups at home. Today I feel like writing them down, partly to get everything straight in my own head.

What I want is all of:

  • multiple backups going back in time, not just one backup that is 'the current state'. Being able to go back in time is often handy (and it's reassuring).

    (This can be done just by having several independent disks, each with a separate backup, and in some ways that's the best option. But I don't want to buy that many disks and disk enclosures for my home backups.)

  • that these multiple backups be independent, so that damage to a single backup still leaves me able to recover all the files from another one (assuming the files were present in both).

    (This implies that I don't want deduplication or a full plus incrementals approach, because either mean that a single point of damage affects a whole series of backups.)

  • these backups should be stored in some reasonably compressed form so they use up relatively little space. This is important in order to get multiple independent copies (which imply that my (raw) backups will take much more space than the live data).

  • the backups should normally be offline. Online backups are only one accident or slip-up away from disappearing.

  • making backups should not kill the performance of my machine, because otherwise that's a disincentive to actually make them.

  • the process of creating the current backup should be something that can be done incrementally, so that I can start it, decide that it's having too much of an impact or is taking too long, and stop it again without throwing away the progress to date.

  • backups should be done in a way that captures even awkward, hard to capture things like holes in files, special ACLs, and so on. I consider traditional dump to be ideal for this, although dump is less than ideal if I'm backing up a live filesystem.

If I ignore the last issue and 'backups should be aggressively compressed', it sounds like what I want is rsync to separate directory trees for each backup run. Rsync is about the best technology for being able to interrupt and resume a backup that I can think of, although perhaps modern Linux backup tools can do it too (I haven't looked at them). I can get some compression by choosing eg ZFS as the filesystem on the backup target (that would also get me integrity checks, which I'd like).

If I ignore being able to interrupt and resume backups, dump doing level 0 full backups to files that are compressed with the standalone compressor of my choice is not a bad choice (and it's my default one today). I think it has somewhat more load impact than other options, though.

The actual performance impact of making backups depends partly on the backup method and partly on how I connect the backup target to my machine (there are (bad) options that have outsized impacts). And in general Linux just doesn't seem to do very well here for me, although perhaps rsync could be made to deliberately run slowly to lower its impact.

(For my home machine, backing up to external disk is probably the only solution that I'm happy with. Over the Internet backups have various issues, including my upstream bandwidth.)


Comments on this page:

By liam at unc edu at 2016-01-14 09:12:31:

Pretty much what I get from Time Machine on my home machines. One local disk target, one disk on an Apple Server target and one disk on a ZFS pool on Nas4Free box. Time Machine is all incremental, saves space with funky hardlink usage. I can set the ZFS pool to compress.

The 3 targets are round-robined by the client, and it handles being interrupted pretty well.

Downside is Apple's propensity to regularly trash it's filesystem, which is a pain on the sparse images it uses on the remote systems. On the Apple Server it means funky commandline diskutil and hdiutil commands to try and fix the filesystem corruption, on the NAS I can roll pack to a previous snapshot.

By Ewen McNeill at 2016-01-14 15:35:22:

I trust you're aware of tools like rsnapshot, which use rsync and hard links to create multiple full backups that use only the space of one full backup and increments? (There are other similar tools, but I've used rsnapshot for years.) The net effect is similar to Apple's Time Machine (which I also use), but without relying on the OS to track files that have changed and need backing up.

I use rsnapshot for network backups, from cron, but I can't see why you couldn't just run it on demand when an external drive was plugged in.

Ewen

PS: for my important machines I also image them periodically onto other external drives, so a point in time restore is just restoring the image.

By John Wiersba at 2016-01-14 18:02:53:

@Ewen: I use rsync with similar options to those used by rsnapshot. The effect is separate, (largely) deduplicated backups. Chris' second bullet requires non-deduplicated backups. However, since I keep my backups offline (Chris' fourth bullet), I don't worry about damage to files shared among backups.

By luddite@luddite.com.au at 2016-01-14 18:52:34:

As an alternative to rsync, FDT http://http://monalisa.cern.ch/FDT/ pjc

By cks at 2016-01-15 10:42:27:

For me, tools like rsnapshot are not what I want because they create backups that aren't independent. If an unchanged file gets damaged (by, say, disk corruption), it will be damaged in all of the backups because all of them only have hardlinks to a single copy of the file; they don't have their own separate copies.

(One could take a hybrid approach of periodic full rsync backups and then a run of rsnapshot backups, which would save some space and maybe speed the backups up. But I'm so-so on even that at the moment.)

By John Wiersba at 2016-01-15 11:58:33:

@Chris I keep multiple physical backups, including backups stored at different offsite locations to mitigate your scenario of disk corruption affecting multiple backups due to shared inodes among backups (file-level deduplication).

What I've noticed about the value of having a backup history is we're a bit inconsistent in how we treat that history: we bundle it into the backup, rather than thinking of it as primary data. If you lose your backup disks, you typically lose your history as well. If we could instead treat the history as part of the data, then we could lose the backup disk and keep the history. I'd quite like a tool that did the history stuff on the same storage volume as my data archive, and I did a more basic backup of the archive + history to my backup disks, for that reason.

Written on 13 January 2016.
« Your system's performance is generally built up in layers
Things I learned from OpenSSH about reading very sensitive files »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jan 13 23:23:52 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.