We're (temporarily) moving to three way mirrored disks on our servers
We've been using mirrored (system) disks on our servers for a fair while now. Initially we reserved it for 'important' systems, but after a few too many failures and close calls we decided to make it pervasive for anything except test machines and completely generic ones. Then, a week ago we had a disk failure on our central Exim mail server, which handles all internal deliveries and forwarding. On the one hand this wasn't a problem, because it had mirrored system disks and so one disk was still fine. On the other hand it was a problem because now we were running a critical machine with no disk redundancy. In the end one of my co-workers made a special trip in to swap out the bad disk for a new good disk.
In the current time, this is not a great thing. Fortunately we've realized that we can, with a simple change in how we build servers. Many of our basic 1U servers have four drive bays, although we normally only use two, and we have plenty of drives sitting on the shelf. So we're going to be setting up any new and replacement servers with three drives in a three way mirror (and perhaps with a fourth drive in the system, just in case); that way, the system still has redundancy if a single drive fails. We'll still want to replace the failed drive eventually, but it can wait until someone has to be in the office for another reason (for example to swap backup 'tapes').
We probably won't try to add extra drives to existing servers because even for machines with four drive bays, none of our current crop of 1U machines have hot-swappable bays; all of them require shutting down the system even to add a drive. Shutting down an important running system just to add redundancy is probably not a good tradeoff (even if someone is in the office for other reasons).
(We're building an updated replacement central mail server for various reasons and the new hardware we're using does have four disks in it, as a three way mirror and a spare just because why not.)
Comments on this page:Written on 15 April 2020.