There are two approaches to disaster recovery plans

October 3, 2015

One of the things I've come to believe in is that there are two ways to approach thinking about your disaster recovery planning (even if you're being quite abstract). The first approach is to think about how you'd restore your services, while the second approach is to think about how you'd get your users working again.

Thinking about how you'd restore your services is very sexy. It leads to planning out things like alternate server rooms, temporary cloud hosting, essential initial infrastructure, what you'd buy if you needed replacement hardware in a hurry, and so on. If you carry it all the way through you wind up with binders of multi-step plans, indexes of what backups are where, and so on.

Thinking about how you'd get your users working again can wind up in a much less comfortable and sexy place, because the first question you have to ask is what your users really need in order to get their work done (or at least to get their important work done). Asking this question can confront you with the reality that a lot of your services and facilities are not really essential for your users, and that what they really care about may not be things you consider important. The first stage disaster recovery plans that can result from this may wind up being much more modest and less sexy than the 'let's rebuild the machine room' sort of plans.

(For example, in places like us the first stage disaster recovery plan might be 'buy everyone important a laptop if they don't already have one, maybe restore some people's address books, and they all go set up GMail accounts and getting back in touch with people they email'.)

Focusing on what your users need to get working again doesn't mean not also having the first sort of disaster recovery plans, since presumably you are going to want to get all your services back eventually. But I think it puts them in the right perspective. The important thing is that necessary work gets done; your services are just a means to that end.

(This is kind of the flipside of what I wrote a while back about the purpose of computer disaster recovery.)

(Of course, if you have an actual disaster without preallocated resources you may find out that some of your services are not important enough any more to come back, or to come back in anywhere near their original form. There's nothing like starting from scratch to cause drastic reassessments of situations.)

Written on 03 October 2015.
« What creates a good wikitext dialect depends on how it's going to be used
There's no point in multiple system filesystems any more »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Oct 3 02:23:40 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.