2013-11-08
The spectrum of options when netbooting systems
Suppose that you want to run your servers without local system disks; instead you will boot them over the network and somehow supply all of what they need to operate that way. In the abstract you need up to three or four different things for this: a potentially read-only version of the system filesystem(s), a way to get machine-specific configurations on to each server, some per-server volatile writeable space, and perhaps some per-server permanent writeable space. Life is easier if servers don't need the latter and are effectively either volatile or read-only.
There are a spectrum of options to provide these that I can think of:
- boot to a ramdisk. The ramdisk can be generic if the servers get
their machine specific configuration through some other method
(including automation frameworks like Chef and Puppet).
The advantage of this setup is that once booted a machine is self-contained. The drawbacks are the lack of innate non-volatile writeable storage and the amount of memory that a ramdisk image may take up. This is probably best used with very small base system images unless you enjoy losing gigabytes of expensive server RAM.
- boot to a ramdisk and mount a read-write network filesystem for
any non-volatile storage needs.
- boot to a per machine read-write network filesystem. This requires
a potentially big fileserver and managing all of those filesystems but
looks the most like normal system disks. The drawback is that it's not
clear how much you gain over just having local disks, which is why
this sort of plain old fashioned diskless machine has fallen out of
favour.
(You can make some subset of the system filesystems read-only and shared, assuming that your operating system cooperates.)
- boot to a generic read-only network filesystem and then overlay it with another filesystem (or more) for writeable storage and machine-specific configuration. The overlay may be in a ramdisk or in another network filesystem or both (for different bits); if you use a ramdisk as the overlay, servers must get their specific configuration through some other method.
(I'm stretching 'network filesystem' to include 'network disk space', for example through iSCSI. I'm also probably overlooking some options.)
Any option involving a network filesystem makes all booted servers depend on the fileserver(s) providing their system filesystems; if it goes down or stalls, they probably will too (they might survive if everything they need is already loaded into memory and running). Note that merely having multiple copies of the fileserver doesn't help; you must be able to have clients transparently migrate from one to the other without a reboot (unless reboots are tolerable in your environment).
Any solution except per machine read-write network filesystems requires some mechanism to update and (re)build the master images or filesystems. Unless you're lucky these will not be part of the operating system's normal system management and there will be friction and pain. Some mechanisms may give you problems with running servers having things updated out from underneath them or getting running servers to pick up updates (again this is not a problem if you can reboot servers on a whim).
Some environments aggressively don't want their systems writing to 'local' storage for anything beyond (maybe) configuration file updates. Things like logs should be shipped off the individual machines to log aggregators and so on, while all system modifications and custom setup are obtained through a configuration management system instead of saved on local storage (and it's a feature if sysadmins get conditioned that they can't make local changes on a specific server that stick).