Fumbling towards understanding how to use DRBD
For a long time I thought of DRBD as a way of getting shared storage or network based RAID-1 (as its website puts it), and I didn't entirely see the point. But there's a different way of looking at it, one that's quite eye-opening and which I was recently exposed to somewhere.
A standard high availability setup for something like virtualization has four machines: two frontends and two backend disk storage machines, with the storage mirrored between the backends and both frontends seeing all storage (this is our setup, for example). If one frontend machine fails, you just bring the services up on the other; if one backend machine fails, you still have a copy of the data on the other one. This is a traditional shared storage setup.
But many services these days have relatively modest disk space demands (virtualization is again a common example). If all you need is a TB or two, the kind of disk space it's easy to fit into a modern server, it seems rather wasteful to use two entire backend machines (and possibly a couple of switches) to deliver it. So let's do without them.
Take two frontend machines with as much disk as possible and split their data space in half. One half is used to host local services, and is replicated to the other frontend with DRBD; the other half is the replica target for the other frontend's local data space. All services get local disk speeds for reads (and maybe close to it for writes). If one frontend fails, the other has a full copy of its data; it declares itself the primary for that half and starts up the services that normally run on the other frontend.
This approach works doesn't scale up as well as an actual SAN; as you add more frontends that need to be able to replace each other, you lose an increasing amount of disk space to data replicas. But it has the great virtue that it works quite efficiently at a small scale, where it lets you use about the minimum number of machines possible (since you're always going to need two machines for frontend redundancy).
(It turns out that this is another story of me not reading the documentation, since I think this is kind of spelled out on the DRBD website. In my defense, it never sounded interesting enough to make me want to read the website; 'networked RAID-1' is not really something I think of as very attractive, and iSCSI and AOE are both more broadly supported for general network disk access.)