Backups and archives

February 9, 2009

When I am thinking of backup issues, I tend to strongly separate backups from archives, and in fact I believe that there are two different sorts of backups with different characteristics. Since I think that this is an important issue, let me explain my distinctions.

To start with, to me the difference between backups and archives is that backups are for recovering after a system failure, while archives are for storing data that you no longer keep online. Here, you have to consider 'system failure' somewhat broadly, so that it includes not just the disks melting down but also things like users (or automated programs, or sysadmin error) accidentally removing or damaging files.

Within backups, there are two sorts of backups: short term backups and long term backups. A short term backup is one that you can pretty much count on restoring on to exactly the same sort of system, the same system environment, that you have now. A long term backup is a backup where you cannot count on this; you are keeping the backup for long enough that you may have done things like upgraded operating system versions and cannot now build an 'old' system just to restore some data.

If you have short term backups, it's perfectly acceptable to use a convenient data format that is very system specific (for example, ZFS's zfs send). But if you have long term backups, you run into potential issues with such things (for example, zfs send is explicitly not guaranteed to work across different Solaris versions), and you need a more portable backup format. How much more portable depends on how much of a radical change you expect, which also depends on how far back you need to be able to go. If you need to go far enough back, you get into archiving territory with all of the headaches that that implies.

(The other thing you need for long term backups is to keep track of how things got relocated and shuffled around in your filesystems, so that you can easily find out where they were in the past in order to do restores. For example, can you easily recover where a user's home directory or a particular database backup was six months ago?)

The archive versus backup distinction is especially important because true archiving is very hard, fundamentally because archiving has very ambitious goals and represents a total commitment (because the data exists only in the archives). Backups are much simpler because they have much smaller goals and generally a much shorter time span, so they do not have to be anywhere as durable (both in media and in the capability to do something useful with the media).

It is my personal belief that archiving is different enough from backups that you should not try to do real archiving with the same system that you use for backups. But if you actually want to use your backups as archives too, you must explicitly design for this and consider the archiving problems. You cannot just assume that your backup format, approach, or system makes for a good archival system, especially as you want to go further and further back in time.

Sidebar: why you want multiple backups

As a side note, somewhat in reaction to something I read recently, I do not feel that backups turn into archives just because you have more than one of them. You should make and keep multiple backups because you want insurance against all of the different sorts of backup failures, which include media failures, backup software failures, and failures to notice damage immediately.

Archives need insurance too, but it takes different forms: you make multiple copies of the archives, and you may do so on different sorts of media or in different formats in case one form turns out to be less durable than expected.

(There are people who make multiple copies of individual backups, but they tend to be either exceptionally paranoid or working with exceptionally high-value data.)

Written on 09 February 2009.
« I hate flaky systems, Fedora 10 and/or hardware edition
True point in time restores may be hard »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Feb 9 23:45:45 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.