True point in time restores may be hard

February 11, 2009

One of the things that people ask for from their backup system, often for sensible reasons, is the ability to recreate the system exactly how it was at some past time; these are usually called 'point in time' restores. Point in time restores for the recent past are usually reasonably easy, but doing it for significant times ago can be very difficult.

The problem is the dependencies involved start to grow and grow. For example, if you want to be able to restore your accounting system to the exact state it was in four years ago, you don't just need a full set of the data from four years ago; you're also probably going to need the exact version of software that you were using four years ago, the same operating system version you were running back then, and then old hardware to run it on (because the four year old OS probably doesn't run on modern hardware because it doesn't have the necessary drivers for things like, say, SATA disks).

Once you have all of that, you may still need new license keys, or at least license keys that haven't expired. In a world that increasingly uses digital signatures, you may also need new un-expired SSL keys, newly signed code for applets, and so on. (You can try turning back the clock to four years ago, but then your old system may not interact very well with the rest of your network.)

Now, this is an extreme and possibly artificial example. But it illustrates the important issue: data has dependencies. If you need to be able to deal with that data the same way you did in the past, you need the data's dependencies or something that is a good enough substitute for them. And those dependencies have other dependencies, and so on.

(This is one of the problems with reliable archives.)

Fortunately, this isn't always the sysadmin's problem. Reasonably often we're just tasked with being able to restore the raw data exactly as it was at some point in time, and interpreting it correctly is (in theory) someone else's concern.


Comments on this page:

From 71.250.234.178 at 2009-02-11 10:50:37:

In the coming future, it should get easier, thanks to virtualization, but I don't think it's going to be cheap and easy until data dedup is everywhere.

By cks at 2009-02-11 12:58:27:

I hadn't thought about the approach of just archiving the entire virtual machine image but you're right, it does simplify the problem a lot (if you can count on the machine image format staying usable). Another advantage of virtualization to keep in mind.

My attitude so far is that data de-duplication only helps solve the easy part of the problem, because I don't think that storing the (non-unique) data is the problem. (I may be wrong here, since I haven't tried to do this kind of virtual machine archiving.)

From 83.145.208.36 at 2009-02-11 13:18:39:

Indeed.

And the picture becomes even more darker when you start to think about all those national projects involved in creating "electrical records" from various paper-based databases.

- j.

From 83.145.208.36 at 2009-02-11 13:21:17:

Addition to above: it is the phrase "data has dependencies" that is eerie, regardless of the context.

-j .

Written on 11 February 2009.
« Backups and archives
Recognizing non-interactive shells and 'shell levels' »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 11 00:35:30 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.