Wandering Thoughts archives

2020-04-15

Some ways that servers make their disks not hot-swappable

In yesterday's entry, I mentioned that none of our current crop of basic 1U servers have hot-swappable drive bays. A commentator then asked:

What exactly makes a "hot-swappable" disk bay? I had assumed that all SAS and SATA connections are always hot-swappable, with maybe the only difference being whether the OS gets automatically poked about new connections or not...

Since I am a little bit grumpy about this, here are some of the ways that servers can have drive bays that aren't hot swappable (I'm sure there's more because server vendors have many ways of making your life more annoying).

The most straightforward way is exemplified by our Dell R210 IIs, which do not so much have drive bays as drive cages. Their drive 'bays' are fully enclosed inside the chassis and can only be removed by taking the top off and then disconnecting separate power and SATA cables. In theory perhaps you could do this while an R210 II was running, but I wouldn't want to attempt it.

The more clever way is that of our Dell R310 IIs (in our configuration). These have externally accessible drive carriers that slide out the front of the chassis, so it looks like you should be able to hot swap them, but the carriers are held in place by latches at the back that can only be accessed by taking the top of the case off and they still plug into ribbon cables. For the Dell R310, Dell calls this the 'cabled hard drive chassis' and it is of course the cheaper option.

The final optional piece of cleverness is that some generations of Dell servers come by default with the BIOS set to explicitly disable any SATA ports that didn't have drives connected in the factory configuration, which generally means all but the first SATA port. This wouldn't stop a hot swap of an existing drive, because you'd already have enabled the SATA port, but it does mean that you can't add an additional hard drive without a trip through the BIOS.

(This can lead to comedy moments when you're setting up a server that's supposed to have a two way or three way mirror and wondering why the installer and so on can only see one drive. Since you have to manually connect the cabling, it's possible to spend some time wondering if you plugged everything in right or the drive is dead before you realize what the real problem is.)

NonHotswapDisksWays written at 22:06:19; Add Comment

We're (temporarily) moving to three way mirrored disks on our servers

We've been using mirrored (system) disks on our servers for a fair while now. Initially we reserved it for 'important' systems, but after a few too many failures and close calls we decided to make it pervasive for anything except test machines and completely generic ones. Then, a week ago we had a disk failure on our central Exim mail server, which handles all internal deliveries and forwarding. On the one hand this wasn't a problem, because it had mirrored system disks and so one disk was still fine. On the other hand it was a problem because now we were running a critical machine with no disk redundancy. In the end one of my co-workers made a special trip in to swap out the bad disk for a new good disk.

In the current time, this is not a great thing. Fortunately we've realized that we can, with a simple change in how we build servers. Many of our basic 1U servers have four drive bays, although we normally only use two, and we have plenty of drives sitting on the shelf. So we're going to be setting up any new and replacement servers with three drives in a three way mirror (and perhaps with a fourth drive in the system, just in case); that way, the system still has redundancy if a single drive fails. We'll still want to replace the failed drive eventually, but it can wait until someone has to be in the office for another reason (for example to swap backup 'tapes').

We probably won't try to add extra drives to existing servers because even for machines with four drive bays, none of our current crop of 1U machines have hot-swappable bays; all of them require shutting down the system even to add a drive. Shutting down an important running system just to add redundancy is probably not a good tradeoff (even if someone is in the office for other reasons).

(We're building an updated replacement central mail server for various reasons and the new hardware we're using does have four disks in it, as a three way mirror and a spare just because why not.)

MovingToThreeWayMirrors written at 00:59:44; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.