Wandering Thoughts archives

2013-11-30

Why 'hotplug' approaches to device handling are the right way

The other day I mentioned that modern versions of Linux handle things like activating software RAID devices through their mechanisms for dynamically appearing devices, which is often called 'hotplugging'. Some people don't like this approach; one way that people put it is that hotplug is fine for the desktop (or laptop) where devices come and go, but servers have a constant set of hardware so shouldn't need the complexity and inherent unpredictable asynchronicity of the whole process. It's my strong opinion that this is wrong in practice and even on servers with constant hardware, hotplug is the right approach on technical grounds.

The problem, even on servers, can be summed up as 'device enumeration'. It's been a long time since the operating system could rapidly and predictably detect devices (since at least the introduction of SCSI and probably even before then). Modern busses and environments require a significant amount of time and probing before you can be sure that you've seen every device available. Multiply this by a number of busses and things get even worse. Things get even worse once you add software based devices such as iSCSI disks because you may need to bring up parts of the operating system before you can see all of the devices you need to fully boot.

You can try to make all of this work by adding longer and longer delays before you look for all of the necessary devices (and then start throwing errors if they're not there), and then layer a bunch of complex topology and specific dependency awareness on top of it to get things like iSCSI to work. But all of this is fundamentally a hack and it can easily break down (and it has in the past when people tried this). You wind up telling sysadmins to insert 'sleep 30' statements in places to get their systems to work and so on. No one like this.

A hotplug based asynchronous system is the simpler, better approach. You give up on trying to wait 'long enough' for device enumeration to finish and simply accept that it's an asynchronous process where you handle disks and other devices as they appear and as parts of the system become ready you boot more and more fully (and if something goes badly wrong you let the operator abort the process). At a stroke this replaces a rickety collection of hacks with a simple, general approach. As a bonus it gracefully handles things going a little bit wrong (for example, your external disk enclosure powering up and becoming visible a bit after the server starts booting, instead of before).

Having said that I don't think anyone today has a perfect hotplug based boot system. But I do think they're getting there and they're much more likely to manage it than anything built on the old linear way of booting.

WhyHotplugRight written at 23:24:22; Add Comment

2013-11-08

The spectrum of options when netbooting systems

Suppose that you want to run your servers without local system disks; instead you will boot them over the network and somehow supply all of what they need to operate that way. In the abstract you need up to three or four different things for this: a potentially read-only version of the system filesystem(s), a way to get machine-specific configurations on to each server, some per-server volatile writeable space, and perhaps some per-server permanent writeable space. Life is easier if servers don't need the latter and are effectively either volatile or read-only.

There are a spectrum of options to provide these that I can think of:

  • boot to a ramdisk. The ramdisk can be generic if the servers get their machine specific configuration through some other method (including automation frameworks like Chef and Puppet).

    The advantage of this setup is that once booted a machine is self-contained. The drawbacks are the lack of innate non-volatile writeable storage and the amount of memory that a ramdisk image may take up. This is probably best used with very small base system images unless you enjoy losing gigabytes of expensive server RAM.

  • boot to a ramdisk and mount a read-write network filesystem for any non-volatile storage needs.

  • boot to a per machine read-write network filesystem. This requires a potentially big fileserver and managing all of those filesystems but looks the most like normal system disks. The drawback is that it's not clear how much you gain over just having local disks, which is why this sort of plain old fashioned diskless machine has fallen out of favour.

    (You can make some subset of the system filesystems read-only and shared, assuming that your operating system cooperates.)

  • boot to a generic read-only network filesystem and then overlay it with another filesystem (or more) for writeable storage and machine-specific configuration. The overlay may be in a ramdisk or in another network filesystem or both (for different bits); if you use a ramdisk as the overlay, servers must get their specific configuration through some other method.

(I'm stretching 'network filesystem' to include 'network disk space', for example through iSCSI. I'm also probably overlooking some options.)

Any option involving a network filesystem makes all booted servers depend on the fileserver(s) providing their system filesystems; if it goes down or stalls, they probably will too (they might survive if everything they need is already loaded into memory and running). Note that merely having multiple copies of the fileserver doesn't help; you must be able to have clients transparently migrate from one to the other without a reboot (unless reboots are tolerable in your environment).

Any solution except per machine read-write network filesystems requires some mechanism to update and (re)build the master images or filesystems. Unless you're lucky these will not be part of the operating system's normal system management and there will be friction and pain. Some mechanisms may give you problems with running servers having things updated out from underneath them or getting running servers to pick up updates (again this is not a problem if you can reboot servers on a whim).

Some environments aggressively don't want their systems writing to 'local' storage for anything beyond (maybe) configuration file updates. Things like logs should be shipped off the individual machines to log aggregators and so on, while all system modifications and custom setup are obtained through a configuration management system instead of saved on local storage (and it's a feature if sysadmins get conditioned that they can't make local changes on a specific server that stick).

NetbootingRootFSSpectrum written at 02:49:48; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.