The crucial difference between online and offline backups

May 17, 2009

At one level, the difference between online backups and offline backups is that online backups are, well, online; you can make them and get at them without having to load tapes (or hard drives), connect your external USB hard drive, or whatever. There are two advantages of online backups; they involve little or no physical shuffling around of things, and they make for very rapid restores of data.

These advantages should not be understated. It's much easier to automate your backups and make sure that they happen all the time, even weekends and holidays and when you are insanely busy, if they don't require anyone to actually do anything physical. And making restores fast and easy keeps them from draining valuable staff time, especially if the most common restore request is for just a small amount of data.

(With large restores, most of the time can be taken up with writing data back to disk. But with small restores that write only a little bit of data to disk, almost all of the time goes to overhead instead, so reducing the overhead can make a drastic difference.)

However, the crucial difference between online backups and offline ones is that online backups can easily be destroyed, whether by accident or malice. By contrast, destroying offline backups takes actual physical work and is much harder to do by accident (although not impossible). It's thus a good idea to have at least some offline backups, just in case, even if online backups are so much more easy and convenient.

Comments on this page:

From at 2009-05-17 05:46:58:

I've been following your backup-discussion and can't find much to disagree with.

This is strictly for home-systems. As they say, I've tried them all, and the most reliable setup I've found for home is probably the oldest, resembling the old practices with tape: (a) full and not incremental (b) daily and (c) monthly (d) online backups with (e) hard drives using no more or less than the classical (f) dump and restore.

As for (f), I still trust dump much more than several other "modern" solutions. Now looking back a decade or so, I've had at least ten complete disk failures, and restore has always been able to provide me exact copy of the old system. Odd thing to say for a technical person, but there is a trust element with backup systems. (Especially the feature-creep of things like Bacula makes me a little nervous.)

I feel lucky that I am not the one doing the backups at work again already due all those non-technical issues such as liability etc.


By cks at 2009-05-17 16:10:43:

Online backups beat having no backups, and I think that there are definitely situations where they're the only really feasible alternative.

(And I agree with you about dump and versions thereof; they've always been good to me, and I trust them quite a bit because I understand how they work and what they're doing.)

From at 2011-02-20 15:53:40:

If your dedicated backup server, which doesn't allow remote login, is configured to run NFS as a user which the kernel allows only to append data and create new files, but doesn't allow overwriting or deleting existing data, then what's the problem? Assuming of course the backup server doesn't have any kernel vulnerabilities, but you can protect against data loss due to kernel exploits by having duplicate backup servers running different kernels.

The easiest way to protect against lightning strikes on backup hard drives is to have offline drives, and the easiest way to protect against EMP, as well as water floods, is to store the drives in ammo boxes, but protection against these physical risks can (with some effort) be achieved with online drives. Protection against fire and theft won't be significantly different for online vs. offline.

The easiest way to ensure long life of backup drives is to store them offline and thus spun down, but an online system can also be configured to spin down the drives except when a backup, restore, or scrub is in progress.

A major advantage of online backups is automatic scrubbing, to detect bitrot early enough to recover from other copies. (This also allows automatic migration of ancient backups to new drives, so long as you continue to buy and attach new drives fast enough to replace failing ones.) Given this advantage, a well-designed online backup system is less likely to lose your data than an offline one. If you try for highest reliability by having a well-designed online system and an offline system, you'll further increase, not decrease, reliability by converting the offline system to another online system, assuming that the probability of bitrot is higher than the probability of simultaneous exploits of two different kernels (and if it isn't, then you can use three independent systems with different kernels; beyond that, you're a good candidate for a tinfoil hat).

For online backups, you do have to buy more computers, in order to attach the drives to the network. But these computers can be cheap; they don't have to be much more than ethernet-to-SATA converters. And for drives used just for backups, you can attach many of them to each computer, to amortize the cost of the computers. You do have to pay more for effective lightning, EMP, and flood protection. So offline backups are cheaper. However, the extra costs for online backups aren't extreme, and for offline backups you still have the time expense of periodically attaching the offline media in order to scrub them.

Written on 17 May 2009.
« Autoresponders in the modern email world
One reason for Unix's permission checking timing »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun May 17 00:44:45 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.