Wandering Thoughts archives

2013-01-20

Disaster recovery for computers is a means, not an end to itself

When you draw up disaster recovery plans for your organization's computers, there is something very important to remember: the ultimate goal of a DR plan for computers is to help the organization to keep working in the face of a disaster. On the one hand, this sounds obvious. On the other hand, there is a huge difference between allowing the organization's computers to keep working after a disaster and allowing the organization to keep working after a disaster. The difference is that there are plenty of other things that your organization may (also) need in order to keep functioning.

(Of course there are organizations where computing is the most important thing about them and is basically the only thing that they need.)

How this matters is that in the broad view, there is no point in the organization's computers being back if the organization is not otherwise functioning. There is especially no point in spending money (or preallocating resources) to make computing survive when the organization doesn't. Doing so is the equivalent of planning to carefully construct and paint a single wall of a house all by itself, without the rest of the house. It's a very nice wall, very well constructed, you've thought of all of the contingencies in building it, but it has no point. All your planning effort is wasted effort.

(It's easy to overlook this if your job is to care very, very much about that one wall.)

Or in short, computing disaster recovery is just one component of overall disaster recovery. It is often not complete by itself.

One consequence of this is that if the organization doesn't or can't have a disaster recovery plan for the other things that it needs to function, a computing DR plan may be more or less pointless. Or at least you don't need a comprehensive DR plan; all you need is a DR plan that covers the contingencies where the only important thing that the organization has lost is the computers. In other words, there may well be some risks that are not worth mitigating in your computer DR plan because the risk would also destroy other things that the organization needs to function and there are no plans for how to recover from them.

(Again, disaster preparation is different from disaster recovery plans. You can be prepared to (eventually) recover from a building going up in flames without having a specific plan for it.)

On the other hand there are some organizations where the only thing that the organization really needs to keep going is its computers and maybe some people to answer the email. In these organizations, computing DR is organizational DR and it may well make sense to pay a lot of attention to a lot of risks and to try to mitigate them. Understanding what sort of organization you're in and what the organization's crucial resources actually are is a big part of good, sensible DR planning.

(The corollary of this is that there are no one size fits all answers for what risks you should consider in computing DR planning.)

tech/DisasterRecoveryPurpose written at 22:39:28; Add Comment

Real disaster recovery plans require preallocated resources

Here is one core thing about meaningful disaster recovery plans: they all require preallocation of resources. This may range from actual servers in actual racks in an actual machine room, all humming and ready to go the moment that you need them, all the way to simply a bunch of money that is reserved for disaster recovery so that you can immediately start buying new hardware and renting colocation space (or simply getting more cloud computing capacity).

If you do not have these preallocated resources, you do not really have a disaster recovery plan; you don't have something you can immediately start executing in any meaningful way and especially you don't have a plan with a time bound. Without preallocated resources, step zero of your DR plan is 'magically get money and other resources from somewhere' and magic is unpredictable and uncertain.

The problem with the preallocated resources that a meaningful DR plan requires is that they are completely unproductive now, whether they are servers that are basically unused or money that is simply sitting there not being spent. As a result there is always going to be a temptation and pressure to take these unproductive resources and do something with them; to claim servers or machine room space or money for some more urgent need.

This temptation is not stupid. At the extreme bound it's completely wrong to insist on not using the preallocated DR resources if it means that the organization goes out of business in the mean time. The relative priority of allocating resources to DR versus allocating resources to something else is always a tradeoff and a risk assessment. Sometimes DR will lose and thus it will lose resources. How often DR loses is partly a function of the organization's relative priorities and partly a function of how prosperous the organization is (ie, how many surplus resources it has in general).

I will give you a corollary: if your organization is low on resources and it does not prioritize disaster recovery very highly, I feel that there is very little point in creating a meaningful disaster recovery plan. The odds are simply very low that you will be able to hold on to your preallocated resources until a disaster happens, so you will be left with a beautiful plan but no means of carrying it out or only the ability to execute random portions.

(Note that you can still be prepared for disasters even without having an DR plan. To simplify, DR preparation is having offsite backups while a DR plan is knowing what you're going to restore them on to.)

sysadmin/DisasterRecoveryPreallocation written at 00:37:41; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.