Our broad reasons for and approach to mirroring disks

September 20, 2024

When I talked about our recent interest in FreeBSD, I mentioned the issue of disk mirroring. One of the questions this raises is what we use disk mirroring for, and how we approach it in general. The simple answer is that we mirror disks for extra redundancy, not for performance, but we don't go too far to get extra redundancy.

The extremely thorough way to do disk mirroring for redundancy is to mirror with different makes and ages of disks on each side of the mirror, to try to avoid both age related failures and model or maker related issues (either firmware or where you find out that the company used some common problematic component). We don't go this far; we generally buy a block of whatever SSD is considered good at the moment, then use them for a while, in pairs, either fresh in newly deployed servers or re-using a pair in a server being re-deployed. One reason we tend to do this is that we generally get 'consumer' drives, and finding decent consumer drives is hard enough at the best of times without having to find two different vendors of them.

(We do have some HDD mirrors, for example on our Prometheus server, but these are also almost always paired disks of the same model, bought at the same time.)

Because we have backups, our redundancy goals are primarily to keep servers operating despite having one disk fail. This means that it's important that the system keep running after a disk failure, that it can still reboot after a disk failure (including of its first, primary disk), and that the disk can be replaced and put into service without downtime (provided that the hardware supports hot swapping the drive). The less this is true, the less useful any system's disk mirroring is to us (including 'hardware' mirroring, which might make you take a trip through the BIOS to trigger a rebuild after a disk replacement, which means downtime). It's also vital that the system be able to tell us when a disk has failed. Not being able to reliably tell us this is how you wind up with systems running on a single drive until that single drive then fails too.

On our ZFS fileservers it would be quite undesirable to have to restore from backups, so we have an elaborate spares system that uses extra disk space on the fileservers (cf) and a monitoring system to rapidly replace failed disks. On our regular servers we don't (currently) bother with this, even on servers where we could add a third disk as a spare to the two system disks.

(We temporarily moved to three way mirrors for system disks on some critical servers back in 2020, for relatively obvious reasons. Since we're now in the office regularly, we've moved back to two way mirrors.)

Our experience so far with both HDDs and SSDs is that we don't really seem to have clear age related or model related failures that take out multiple disks at once. In particular, we've yet to lose both disks of a mirror before one could be replaced, despite our habit of using SSDs and HDDs in basically identical pairs. We have had a modest number of disk failures over the years, but they've happened by themselves.

(It's possible that at some point we'll run a given set of SSDs for long enough that they start hitting lifetime limits. But we tend to grab new SSDs when re-deploying important servers. We also have a certain amount of server generation turnover for important servers, and when we use the latest hardware it also gets brand new SSDs.)


Comments on this page:

By vowhite at 2024-09-22 21:40:02:

This means that it's important that the system keep running after a disk failure, that it can still reboot after a disk failure (including of its first, primary disk)

Is there much of a trick to this, or does it just work in Ubuntu? I haven't actually tested whether my system's UEFI will fall back to a second disk, though I'm aware that certain filesystems including btrfs won't mount a "degraded" array unless a mount option is specified.

By cks at 2024-09-23 22:54:38:

This is a good question so I wrote up what I know about getting (mostly) redundant UEFI boot disks. What I forgot to mention is that as far as I know, if you set up software RAID mirrors in the installer, Ubuntu will mark them so that they'll boot in degraded mode (with only one of two disks available). I think there may be a boot delay while systemd waits in hopes, but we've never had a problem having them come up.

(These days, systemd and udev will normally bring up most degraded software RAID arrays.)

By vowhite at 2024-09-24 11:42:30:

What I forgot to mention is that as far as I know, if you set up software RAID mirrors in the installer, Ubuntu will mark them so that they'll boot in degraded mode (with only one of two disks available).

Just for anyone who doesn't know, the btrfs RAID mode is kind of its own thing, independent of mdadm and Linux's "software RAID" support. So it might not be considered an "array" for the purposes of the installer, in which case it will need its own "degraded" setting.

Doing mirroring at the filesystem level has some advantages. For example, if the disks mis-match, btrfs (and probably ZFS and bcachefs) can read all copies, use the checksums to determine the correct one, and re-write its mirror(s). It also makes the disks non-identical in another way, and I'm a bit more wary than you about identical setups. An ex-sysadmin co-worker of mine once mentioned having two same-batch HDD failures bring down an entire array. Some models were infamously bad, like the "Deathstars" and the 3TB Seagates. For SSDs, I'm more worried about firmware bugs, particularly when the contents and write-patterns are the same; there's a long history of SSD firmware bugs.

I'm just building occasional home-use systems, and usually have no trouble finding two decent-looking different-model storage devices. Except the one time I happened to buy the first drive just before that Thailand flood drove up prices. About a week later, I walked into a computer store to try to buy a second drive, and the clerk just laughed—"we sold out days ago, and if any computer store still has stock it's not gonna be $80 anymore". Luckily, the people at Staples were as clueless as expected, so they still had stock with their usual 50% over-pricing.

By Miksa at 2024-09-25 10:45:46:

SSD firmware bugs can certainly be a thing. I have seen amusingly many advisories for firmware updates that would fix failures at 32768 or 65536 power-on hours. I'm actually waiting for the next maintenance window for on server with this exact issue.

https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-a00142174en_us

Written on 20 September 2024.
« OpenBSD versus FreeBSD pf.conf syntax for address translation rules
TLS certificates were (almost) never particularly well verified »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Fri Sep 20 22:51:03 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.