Fumbling towards understanding how to use DRBD

December 13, 2010

For a long time I thought of DRBD as a way of getting shared storage or network based RAID-1 (as its website puts it), and I didn't entirely see the point. But there's a different way of looking at it, one that's quite eye-opening and which I was recently exposed to somewhere.

A standard high availability setup for something like virtualization has four machines: two frontends and two backend disk storage machines, with the storage mirrored between the backends and both frontends seeing all storage (this is our setup, for example). If one frontend machine fails, you just bring the services up on the other; if one backend machine fails, you still have a copy of the data on the other one. This is a traditional shared storage setup.

But many services these days have relatively modest disk space demands (virtualization is again a common example). If all you need is a TB or two, the kind of disk space it's easy to fit into a modern server, it seems rather wasteful to use two entire backend machines (and possibly a couple of switches) to deliver it. So let's do without them.

Take two frontend machines with as much disk as possible and split their data space in half. One half is used to host local services, and is replicated to the other frontend with DRBD; the other half is the replica target for the other frontend's local data space. All services get local disk speeds for reads (and maybe close to it for writes). If one frontend fails, the other has a full copy of its data; it declares itself the primary for that half and starts up the services that normally run on the other frontend.

This approach works doesn't scale up as well as an actual SAN; as you add more frontends that need to be able to replace each other, you lose an increasing amount of disk space to data replicas. But it has the great virtue that it works quite efficiently at a small scale, where it lets you use about the minimum number of machines possible (since you're always going to need two machines for frontend redundancy).

(It turns out that this is another story of me not reading the documentation, since I think this is kind of spelled out on the DRBD website. In my defense, it never sounded interesting enough to make me want to read the website; 'networked RAID-1' is not really something I think of as very attractive, and iSCSI and AOE are both more broadly supported for general network disk access.)

Comments on this page:

From at 2010-12-14 07:57:37:

You might want to have a look at Google's Ganeti project:


which is a tool to manage virtual machines on a DRBD based backend.

From at 2010-12-14 08:51:50:

I also work for a university and I run a hosting service for researchers. I have two data centers at my disposal about 50 miles apart. I use DRBD to cut down the sync time as part of a migration between the data centers. Doing rsync on 100GB of small files is avoidable if the block devices are synced.

Written on 13 December 2010.
« A program that I want to write: a 'sink' SMTP server
Always remember that people make mistakes »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Dec 13 22:17:01 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.