Why my home backup situation is currently a bit awkward

January 26, 2016

In this recent entry I mentioned that my home backup strategy is an awkward subject. Today I want to talk about why that is so, which has two or perhaps three sides; the annoyances of hardware, that disks are slow, and that software doesn't just do what I want, partly because I want contradictory things.

In theory, the way to good backups is straightforward. You buy an external disk drive enclosure and a disk for it, connect it to your machine periodically, and 'do a backup' (whatever that is). Ideally you will be disciplined about how frequently you do this. And indeed, relatively early on I set myself up to do this, except that back then I made a mistake; rather than get an external enclosure with both USB and eSATA, I got one with just USB because I had (on my machine at the time) no eSATA ports. To be more precise I got an enclosure with USB 2.0, because that's what was available at the time.

If you know USB 2.0 disk performance, you are now wincing. USB 2.0 disks are dog slow, at least on Linux (I believe I once got a benchmark result on the order of 15 MBytes/sec), and they also usually hammer the responsiveness of your machine into the ground. On top of that I didn't really trust the heat dissipation of the external drive case, which meant that I was nervous about leaving the drive powered on and running overnight or the like. So I didn't do too many backups to that external enclosure and drive. It was just too much of a pain for too long.

With my second external drive case and drive, I learned better (at least in theory); I bought a case with USB and eSATA. Unfortunately only USB 2.0, and then something in the combination of the eSATA port on my new machine and the case didn't work really reliably. I've been able to sort of work around that but the workaround doesn't make me really happy to have the drive connected, there's still a performance impact from backups, and the heat concerns haven't gone away.

(My replacement for the eSATA port is to patch a regular SATA port through the case. This works but makes me nervous and I think I've seen it have some side effects on the machine when the drive connects or disconnects. In general, eSATA is probably not the right technology here.)

This brings me to slow disks. I can't remember how fast my last backup run went, but between the overheads of actually making backups (in walking the filesystem and reading files and so on) and the overheads of writing them out, I'd be surprised if they ran faster than 50 MBytes/sec (and I suspect they went somewhat slower). At that rate, it takes an hour to back up only 175 GB. With current disks and hardware, backups of lots of data are just going to be multi-hour things, which does not encourage me to do them regularly at the best of times.

(Life would be different if I could happily leave the backups to run when I wasn't present, but I don't trust the heat dissipation of the external drive case that much, or for that matter the 'eSATA' connection. Right now I feel I have to actively watch the whole process.)

As I wrote up in more detail here, my ideal backup software would basically let me incrementally make full backups. Lacking something to do that, the low effort system I've wound up with for most things uses dump. Dump captures exact full backups of extN filesystems and can be compressed (and I can keep multiple copies), but it's not something you can do incrementally. Running dump against a filesystem is an all or nothing affair; either you let it run for as many hours as it winds up taking, or you abort it and get nothing. Using dump also requires manually managing the process, including keeping track of old filesystem backups and removing some of them to make space for new ones.

(Life would be somewhat different if my external backup disk was much larger than my system disk, but as it happens it isn't.)

This is far from an ideal situation. In theory I could have regular, good backups; in practice there is enough friction in all of the various pieces that I have de facto bad ones, generally only made when something makes me alarmed. Since I'm a sysadmin and I preach the gospel of backups in general, this feels especially embarrassing (and awkward).

(I think I see what I want my situation to look like moving forwards, but this entry is long enough without trying to get into that.)


Comments on this page:

By Arnaud Gomes at 2016-01-27 01:53:05:

USB3 is the way to go. :-) I have one of these toaster-like disk enclosures (so no heat dissipation issues) running on USB3; an incremental back-up using rsync (that is 2+ TB on my main machine and a few hundred GB on my home servers) takes about 2h. I guess the initial full backup took much longer but once it is done I only do incrementals so I never worried about it. The machine still feels heavily loaded with a backup running, but I think it was far worse when I was using USB2.

By Legooolas at 2016-01-27 08:37:52:

I bought a cheap 2-bay NAS (without disks) (a ZyXEL NSA325 v2) and just use NFS to mount it on any machines which want to use it as a backup destination. It manages about 50MB/sec over gigabit ethernet, and has no particular negative effect on the responsiveness of my machines that are sending data to it (that I've noticed, at least!).

I also don't worry about the disk cooling as it has a small fan which does a pretty good job of that, and allows for monitoring (and probably alerting?) based on disk temperature and SMART status. Good enough for simple backups, at least :)

(Plus a small external USB3 disk for taking data to an off-site backup location)

Other NASes are available fairly cheaply, it's just the one I happened to get when it was particularly cheap and that I had spare disks to stick in it :)

By jbd at 2016-01-27 12:11:04:

I'm deeply in love with attic (https://attic-backup.org/) since a few years now.

I should have a look to its acrive fork borgbackup (http://borgbackup.readthedocs.org/en/stable/). restic (http://restic.github.io/) looks also quite cool.

By Kevin B. at 2016-01-27 13:42:44:

The solution I use is Crashplan. The last time I checked you can backup up for free on the lan and to friends over the internet. You only have to pay to back up to them, and get some extended features. Works with most operating systems.

http://www.code42.com/crashplan/features/compare/

I have a linux server that's running all the time with an external 5TB USB3.0 drive, and also backup to Crashplan servers

By gsauthof (gsauthof) at 2016-01-27 15:49:39:

USB 2.0 is not that bad - depending on the hardware/kernel I usually get 24 to 33 MiB/s.

I was in an similar situation - machine with 250 GB, XFS, only USB 2.0 and the need for incremental and full backups.

In that situation I just created a BTRFS filesystem on the backup drive and used rsync and the filesystem snapshot feature - i.e. rsync first and then do a snapshot.

Sure, the very first backup takes very long, but the ones after that are quite fast (including creating a snapshot, assuming some kind of 'sane' usage pattern).

I created some scripts for automating all this (especially for cleaning up outdated snapshots, according to a backup schedule):

https://github.com/gsauthof/btrarch

Nowadays, I've switched from XFS to BTRFS (on the main drive), thus I directly use the btrfs send/receive features for doing backups (also scripted for maintaining the snapshots on the receiving side).

You buy an external disk drive enclosure and a disk for it, connect it to your machine periodically, and 'do a backup' (whatever that is).

The best definition of "backup" I've managed to come up with: a coherent and recoverable copy of the data on independent media.

By Ewen McNeill at 2016-01-27 20:36:56:

I'd echo the comment about USB 2.0 being tolerable for backup speeds -- from memory my USB 2.0 drives usually manage about 30MB/s sustained (480Mbps/10, less some overhead and request latency). That's okay for backups if you're not hanging around waiting for them. My "always incremental" (OS X Time Machine) is to a USB 2.0 drive, and mostly I don't notice it taking that long on it's "every hour" updates. (I do, however, frequently notice OS X's annoying habit of waiting on all disks to spin up, any time it wants to access any of them; my flash drive spins up instantly, but the spinning rust... takes a while, and whatever processes wanted to access the disk hang until the spinning rust is up to speed, even if they just needed to access the flash drive. :-( Ironically I know Microsoft Windows has an optimisation for just that case in it, for years, because they're aware of the poor user experience that results from "spin up all drives and wait for all of them" on any access.)

OTOH, USB 3.0 is very effective for driving "spinning rust" drives. My backups (think "whole drive rsync" to USB 3) typically take about 20 minutes to update a month's worth of changes (4 external USB 3 drives in rotation; typically updated one per week). It feels like "basically native speed" for any "spinning rust" drive. That's handy for "update the drive I'm going to take offsite"; for USB 2 I'd need to leave it running and at least have lunch, if not go away overnight. (I think USB 3 is probably a better choice than eSATA or this at this point; the hotplugging story is much better.)

FWIW, it occurs to me if the external drive was big enough, you could keep multiple rsync-updated "full backups" on the external drive (ie "/copy1", "/copy2", ...), and then rsync update one each backup round. Personally I wouldn't trust that as much as four separate drives, but YMMV. (One could also potentially make a hybrid "hard links plus separate files" set out of those -- eg, hardlinks for large files that could be recreated/obtained from elsewhere, like conference videos, and the rest separate in each copy; IIRC rsync will respect those hard links providing the file doesn't change.)

Finally it sounds like your biggest issue is that you don't really trust your external drive (eg, to run by itself for hours). Just saying.

Ewen

PS: My home server runs with only eSATA external drives, and has for years (two generations of hardware). The only issues I've had (including disconnects) seem to be related to dying power supplies for them. I did deliberately choose enclosures for good heat dissipation though...

Written on 26 January 2016.
« Low level issues can have quite odd high level symptoms (again)
Modern Django makes me repeat myself in the name of something »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jan 26 23:18:37 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.