What I want out of backups for my home machine (in the abstract)
Over time I've come to realize that part of my home backup issues is that I have somewhat contradictory desires for my backups at home. Today I feel like writing them down, partly to get everything straight in my own head.
What I want is all of:
- multiple backups going back in time, not just one backup that is
'the current state'. Being able to go back in time is often handy
(and it's reassuring).
(This can be done just by having several independent disks, each with a separate backup, and in some ways that's the best option. But I don't want to buy that many disks and disk enclosures for my home backups.)
- that these multiple backups be independent, so that damage to a
single backup still leaves me able to recover all the files from
another one (assuming the files were present in both).
(This implies that I don't want deduplication or a full plus incrementals approach, because either mean that a single point of damage affects a whole series of backups.)
- these backups should be stored in some reasonably compressed form
so they use up relatively little space. This is important in order
to get multiple independent copies (which imply that my (raw)
backups will take much more space than the live data).
- the backups should normally be offline. Online backups are only
one accident or slip-up away from disappearing.
- making backups should not kill the performance of my machine,
because otherwise that's a disincentive to actually make them.
- the process of creating the current backup should be something
that can be done incrementally, so that I can start it, decide
that it's having too much of an impact or is taking too long,
and stop it again without throwing away the progress to date.
- backups should be done in a way that captures even awkward,
hard to capture things like holes in files, special ACLs, and
so on. I consider traditional
dumpto be ideal for this, although
dumpis less than ideal if I'm backing up a live filesystem.
If I ignore the last issue and 'backups should be aggressively
compressed', it sounds like what I want is
rsync to separate
directory trees for each backup run. Rsync is about the best
technology for being able to interrupt and resume a backup that I
can think of, although perhaps modern Linux backup tools can do it
too (I haven't looked at them). I can get some compression by
choosing eg ZFS as the filesystem on the backup target (that would
also get me integrity checks, which I'd like).
If I ignore being able to interrupt and resume backups,
level 0 full backups to files that are compressed with the standalone
compressor of my choice is not a bad choice (and it's my default
one today). I think it has somewhat more load impact than other
The actual performance impact of making backups depends partly on
the backup method and partly on how I connect the backup target to
my machine (there are (bad) options that have outsized impacts).
And in general Linux just doesn't seem to do very well here for me,
rsync could be made to deliberately run slowly
to lower its impact.
(For my home machine, backing up to external disk is probably the only solution that I'm happy with. Over the Internet backups have various issues, including my upstream bandwidth.)
Your system's performance is generally built up in layers
There are many facets and many approaches to troubleshooting performance issues, but there are also some basic principles that can really help to guide your efforts. One of them, one so fundamental that it often doesn't get mentioned, is that your system and its performance is built up of layers and thus to troubleshoot system performance you want to test and measure each layer, working upwards from the base layers (whatever they are).
(A similar 'working upwards' process can be used to estimate the best performance possible in any particular environment. This too can be useful, for example to assess how close to it you are or if the best possible performance can possibly meet your needs.)
To make this more concrete, suppose that you have an iSCSI based fileserver environment and the filesystems on your fileservers are performing badly. There are a lot of moving parts here; you have the physical disks on the iSCSI targets, the network link(s) between the fileservers and the iSCSI targets, the iSCSI software stack on both sides, and then the filesystem that's using the disks on the fileserver (and perhaps a RAID implementation on the iSCSI targets). Each of these layers in the stack is a chance for a performance problem to sneak in, so you want to test them systematically:
- how fast is a single raw disk on the iSCSI targets, measured locally on a target?
- how fast are several raw disks on the iSCSI targets when they're all operating at once?
- if the iSCSI targets are doing their own RAID, how fast can that go
compared to the raw disk performance?
- how fast is the network between the fileserver and the iSCSI targets?
- how fast is the iSCSI stack on the initiator and targets? Some iSCSI
target software supports 'dummy' targets that don't do any actual IO,
so you can test raw iSCSI speed. Otherwise, perhaps you can swap in a
very fast SSD or the like for testing purposes.
- how fast can the fileserver talk to a single raw disk over iSCSI? To several of them at once? To an iSCSI target's RAID array, if you're using that?
By working through the layers of the stack like this, you have a much better chance of identifying where your performance is leaking out. Not all performance problems are neatly isolated to a single layer of the stack (there can be all sorts of perverse interactions across multiple layers), but many are and it's definitely worth checking out first. If nothing else you'll rule out obvious and easily identified problems, like 'our network is only running at a third of the speed we really ought to be getting'.
Perhaps you think that this layering approach should be obvious, but let me assure you that I've seen people skip it. I've probably skipped it myself on occasion, when I felt I was in too much of a hurry to really analyze the problem systematically.
PS: when assessing each layer, you probably want to look at something like Brendan Gregg's USE Method in addition to measuring the performance you can get in test situations.