Why 'hotplug' approaches to device handling are the right way

November 30, 2013

The other day I mentioned that modern versions of Linux handle things like activating software RAID devices through their mechanisms for dynamically appearing devices, which is often called 'hotplugging'. Some people don't like this approach; one way that people put it is that hotplug is fine for the desktop (or laptop) where devices come and go, but servers have a constant set of hardware so shouldn't need the complexity and inherent unpredictable asynchronicity of the whole process. It's my strong opinion that this is wrong in practice and even on servers with constant hardware, hotplug is the right approach on technical grounds.

The problem, even on servers, can be summed up as 'device enumeration'. It's been a long time since the operating system could rapidly and predictably detect devices (since at least the introduction of SCSI and probably even before then). Modern busses and environments require a significant amount of time and probing before you can be sure that you've seen every device available. Multiply this by a number of busses and things get even worse. Things get even worse once you add software based devices such as iSCSI disks because you may need to bring up parts of the operating system before you can see all of the devices you need to fully boot.

You can try to make all of this work by adding longer and longer delays before you look for all of the necessary devices (and then start throwing errors if they're not there), and then layer a bunch of complex topology and specific dependency awareness on top of it to get things like iSCSI to work. But all of this is fundamentally a hack and it can easily break down (and it has in the past when people tried this). You wind up telling sysadmins to insert 'sleep 30' statements in places to get their systems to work and so on. No one like this.

A hotplug based asynchronous system is the simpler, better approach. You give up on trying to wait 'long enough' for device enumeration to finish and simply accept that it's an asynchronous process where you handle disks and other devices as they appear and as parts of the system become ready you boot more and more fully (and if something goes badly wrong you let the operator abort the process). At a stroke this replaces a rickety collection of hacks with a simple, general approach. As a bonus it gracefully handles things going a little bit wrong (for example, your external disk enclosure powering up and becoming visible a bit after the server starts booting, instead of before).

Having said that I don't think anyone today has a perfect hotplug based boot system. But I do think they're getting there and they're much more likely to manage it than anything built on the old linear way of booting.

Written on 30 November 2013.
« The case of the disappearing ESATA disk
What Go has become for me: Python with performance »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Nov 30 23:24:22 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.