We're (temporarily) moving to three way mirrored disks on our servers

April 15, 2020

We've been using mirrored (system) disks on our servers for a fair while now. Initially we reserved it for 'important' systems, but after a few too many failures and close calls we decided to make it pervasive for anything except test machines and completely generic ones. Then, a week ago we had a disk failure on our central Exim mail server, which handles all internal deliveries and forwarding. On the one hand this wasn't a problem, because it had mirrored system disks and so one disk was still fine. On the other hand it was a problem because now we were running a critical machine with no disk redundancy. In the end one of my co-workers made a special trip in to swap out the bad disk for a new good disk.

In the current time, this is not a great thing. Fortunately we've realized that we can, with a simple change in how we build servers. Many of our basic 1U servers have four drive bays, although we normally only use two, and we have plenty of drives sitting on the shelf. So we're going to be setting up any new and replacement servers with three drives in a three way mirror (and perhaps with a fourth drive in the system, just in case); that way, the system still has redundancy if a single drive fails. We'll still want to replace the failed drive eventually, but it can wait until someone has to be in the office for another reason (for example to swap backup 'tapes').

We probably won't try to add extra drives to existing servers because even for machines with four drive bays, none of our current crop of 1U machines have hot-swappable bays; all of them require shutting down the system even to add a drive. Shutting down an important running system just to add redundancy is probably not a good tradeoff (even if someone is in the office for other reasons).

(We're building an updated replacement central mail server for various reasons and the new hardware we're using does have four disks in it, as a three way mirror and a spare just because why not.)

Comments on this page:

From at 2020-04-15 02:42:28:

What exactly makes a "hot-swappable" disk bay? I had assumed that all SAS and SATA connections are always hot-swappable, with maybe the only difference being whether the OS gets automatically poked about new connections or not...

(My personal HP MicroServer also says "non-hotswap", but it seems that in HP's language that only means the trays don't have any of the fancy electronics but doesn't stop the controller from correctly reporting new or removed disks to Linux, which just leaves me even more confused about it.)

By dozzie at 2020-04-15 04:55:54:

What exactly makes a "hot-swappable" disk bay?

Disk chassis, mainly. Not all are designed that you can remove a disk while the server is operating, e.g. some servers may not have the disk bays exposed on the front panel at all.

@cks: There's always the option of using two-way mirror with a hot spare (MD RAID can to do that) or with an unused disk that will get added to the RAID when one of the used disks fails. Not sure if there's much benefit in doing so; maybe for SSDs (less writes), or that it preserves the current setup.

Written on 15 April 2020.
« If you use GNU Grep on text files, use the -a (--text) option
Some ways that servers make their disks not hot-swappable »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Apr 15 00:59:44 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.