Wandering Thoughts archives

2010-12-13

Fumbling towards understanding how to use DRBD

For a long time I thought of DRBD as a way of getting shared storage or network based RAID-1 (as its website puts it), and I didn't entirely see the point. But there's a different way of looking at it, one that's quite eye-opening and which I was recently exposed to somewhere.

A standard high availability setup for something like virtualization has four machines: two frontends and two backend disk storage machines, with the storage mirrored between the backends and both frontends seeing all storage (this is our setup, for example). If one frontend machine fails, you just bring the services up on the other; if one backend machine fails, you still have a copy of the data on the other one. This is a traditional shared storage setup.

But many services these days have relatively modest disk space demands (virtualization is again a common example). If all you need is a TB or two, the kind of disk space it's easy to fit into a modern server, it seems rather wasteful to use two entire backend machines (and possibly a couple of switches) to deliver it. So let's do without them.

Take two frontend machines with as much disk as possible and split their data space in half. One half is used to host local services, and is replicated to the other frontend with DRBD; the other half is the replica target for the other frontend's local data space. All services get local disk speeds for reads (and maybe close to it for writes). If one frontend fails, the other has a full copy of its data; it declares itself the primary for that half and starts up the services that normally run on the other frontend.

This approach works doesn't scale up as well as an actual SAN; as you add more frontends that need to be able to replace each other, you lose an increasing amount of disk space to data replicas. But it has the great virtue that it works quite efficiently at a small scale, where it lets you use about the minimum number of machines possible (since you're always going to need two machines for frontend redundancy).

(It turns out that this is another story of me not reading the documentation, since I think this is kind of spelled out on the DRBD website. In my defense, it never sounded interesting enough to make me want to read the website; 'networked RAID-1' is not really something I think of as very attractive, and iSCSI and AOE are both more broadly supported for general network disk access.)

linux/UnderstandingDRBD written at 22:17:01; Add Comment

A program that I want to write: a 'sink' SMTP server

Mostly for historical reasons, my office workstation still runs its own mailer and I still get a very small amount of email to it. I get many, many more spam attempts, because for very many years (in the pre-spam days) it was the primary email address that I used and I used it widely. Over the years I've put up an ever-increasing set of anti-spam precautions that wind up rejecting almost all attempted SMTP connections.

(Technically I wind up dropping connections after sending a 5xx non-greeting banner.)

There's two drawbacks to this approach. The first is that there are a lot of mailers out there that don't like it when their SMTP connection attempt is refused this way and immediately retry it (some common Microsoft mailer, probably Exchange, is especially prone to this). These retries clutter up my logs and annoy me. Second, I'm curious enough that I'd like to know what sort of spam the spammers are trying to send me, or at least information like what addresses here they're trying to spam; after all, this could be a great way to build up an interesting corpus of current spam.

What I need here is some sort of 'sink' SMTP server. This would be a very simple SMTP server that would cheerfully accept more or less anything you told it, log it all, and reply with a 5xx error after the end of the DATA phase (if mailers still explode at this I might make it lie and accept it with a 2xx). Since I expect to get a decent amount of spam (some of it in very bursty waves) I would like this server to be reasonably efficient, even in the face of vaguely hostile clients.

Since I still get real email, I can't have this be a normal server that listens on the SMTP port and just accepts connections (the real email has to go to the real mailer's SMTP agent). Fortunately I already do most of my anti-spam precautions in an inetd-like frontend, so I can have the frontend pass 'spam' connections to the sink SMTP server instead of just rejecting them (by passing the new file descriptor to the sink SMTP server over a Unix domain socket).

Of course, I'd love to use this to try out one of the languages I'd like to learn. Now that I've had it pointed out to me by a commentator, node.js is the obvious candidate; it's practically built for this, plus it already has support for passing file descriptors over Unix domain sockets. Go is possible and I'd definitely like to explore it, but it doesn't have support for file descriptor passing and it's not clear how to add it.

spam/SinkSMTPServerDesire written at 01:53:48; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.