Why 'hotplug' approaches to device handling are the right way
The other day I mentioned that modern versions of Linux handle things like activating software RAID devices through their mechanisms for dynamically appearing devices, which is often called 'hotplugging'. Some people don't like this approach; one way that people put it is that hotplug is fine for the desktop (or laptop) where devices come and go, but servers have a constant set of hardware so shouldn't need the complexity and inherent unpredictable asynchronicity of the whole process. It's my strong opinion that this is wrong in practice and even on servers with constant hardware, hotplug is the right approach on technical grounds.
The problem, even on servers, can be summed up as 'device enumeration'. It's been a long time since the operating system could rapidly and predictably detect devices (since at least the introduction of SCSI and probably even before then). Modern busses and environments require a significant amount of time and probing before you can be sure that you've seen every device available. Multiply this by a number of busses and things get even worse. Things get even worse once you add software based devices such as iSCSI disks because you may need to bring up parts of the operating system before you can see all of the devices you need to fully boot.
You can try to make all of this work by adding longer and longer delays
before you look for all of the necessary devices (and then start
throwing errors if they're not there), and then layer a bunch of complex
topology and specific dependency awareness on top of it to get things
like iSCSI to work. But all of this is fundamentally a hack and it can
easily break down (and it has in the past when people tried this). You
wind up telling sysadmins to insert '
sleep 30' statements in places
to get their systems to work and so on. No one like this.
A hotplug based asynchronous system is the simpler, better approach. You give up on trying to wait 'long enough' for device enumeration to finish and simply accept that it's an asynchronous process where you handle disks and other devices as they appear and as parts of the system become ready you boot more and more fully (and if something goes badly wrong you let the operator abort the process). At a stroke this replaces a rickety collection of hacks with a simple, general approach. As a bonus it gracefully handles things going a little bit wrong (for example, your external disk enclosure powering up and becoming visible a bit after the server starts booting, instead of before).
Having said that I don't think anyone today has a perfect hotplug based boot system. But I do think they're getting there and they're much more likely to manage it than anything built on the old linear way of booting.
The case of the disappearing ESATA disk
This is a mystery (ie I have no answers yet), and also a story of what I think is the perversity of hardware (I can't be sure yet). I'm writing it up partly because I rarely see sysadmins writing up our problems, with the result that I think it's easy to underestimate how weird things sometimes get out there.
We have a server with an external SATA disk enclosure. The enclosure has three port multiplier based (E)SATA channels, each with five drive bays on them; we currently have ten disks in the enclosure, all identical, taking up the full capacity of two channels. The server is running 64-bit Ubuntu 12.04. We recently moved the server from our test area to our production machine room, which was when we discovered the mystery: under specific circumstances, exactly one disk is not seen by the server.
If you power off the external enclosure and the server, the first time the server boots it will not see one specific disk bay on the enclosure. This is not just that the disk in the disk bay doesn't respond fast enough; the disk remains invisible no matter how long you let it sit. Rebooting the server will make the disk reappear, as will hotplugging the disk (pulling out its disk sled just enough to cut power, then pushing it back in). This doesn't happen if just the server itself is powered down; as long as the disk enclosure stays powered on, all is fine. So far this could be a whole list of things. Unfortunately this is where it gets weird. First, it's not the disk itself; we've swapped disks between bays and the problem stays with the specific bay. Next, it's not a straightforward hardware failure in the enclosure or anything directly related to it; at this point we've swapped the disk enclosure itself (with a spare), the ESATA cables, and the ESATA controller card in the server.
(To cut a long story short, it's quite possible that the problem has been there all along. Nor do we have any other copies of this model of disk enclosure around where we can be sure that they don't have the problem (since we have two more of these enclosures in production, this is making me nervous).)
One of the many things that really puzzles me about this is trying to come up with an explanation for why this could be happening. For instance, why does the disk become visible if we merely reboot the server?
I don't usually run into problems like these, which I'm generally very thankful for. But every so often something really odd comes up and apparently this is one of those times.
(Also, I guess power-fail tests are going to have to become a standard thing that we do before we put machines into production. If this kind of fault can happen once it can happen more than once, and we'd really like not to find out about it after the first time we have to power cycle all of this stuff in production.)
PS: Now you may be able to guess why I have a sudden new interest in how modern Linux assembles RAID arrays. It certainly hasn't helped testing that the drives have a RAID-6 array on them that we'd rather not have explode, especially when resyncs take about 24 hours.
Sidebar: Tests we should do
Since I've been coming up with these ideas in the course of writing this entry, I'm going to put them down here:
- Reorder the ESATA cables (changing the mapping between ESATA controller
card ports and the enclosure's channels). If the faulted bay
changed to the other channel it would mean that the problem isn't in
the enclosure but is something upstream.
- 'Hotswap' another drive on the channel to see if the invisible disk then becomes visible due to the full channel reset et al.
I'm already planning to roll more recent kernels than the normal Ubuntu 12.04 one on to the machine to see what happens, but that's starting to grasp at straws.