My views on network booting as an alternative to system disks

November 10, 2013

In a comment on my entry on the potential downsides of SSD as system disks, zwd asked I'd considered skipping the need for system disks by just PXE booting the systems instead (as some Illumos distributions are now recommending). The short answer is no but I have enough thoughts about this to warrant a long answer.

My view is that network booting systems is at its best where you have a large and mostly homogenous set of servers that basically run a constant set of things with little local state or local configuration. In this environment you don't want to bother taking the time to install to the local disks on today's server and it simplifies life if you can upgrade machines just by rebooting them. With little local state the difficulty of having state in a diskless environment doesn't cause too much heartburn in practice and running a constant set of programs generally reduces the load on your 'system filesystem' fileserver and may make it practical to have an all-in-RAM system image.

With that said, in general a diskless environment is almost intrinsically more complicated than local disks in theory and definitely more complicated in practice today. While you have a spectrum of options none of them are as simple and as resilient as local disks; they all require some degree of external support and create complications around things like software upgrades. Some of the options require significant infrastructure. All of them create additional dependencies before your servers will boot. In a large environment the simplifications elsewhere make up for this.

We aren't a large environment. In fact we're a very bad case for netbooting. Our modest number of systems are significantly heterogenous, they have potentially significant local state, a given system often runs a wide variety of software (very wide, for systems users log in to), and we don't want to reboot them at all in normal conditions. Some servers are already dependent on central NFS fileservers but other servers we very much want to keep working even if the fileserver environment has problems and of course the components of the fileserver environment are a crucial central point that we want to work almost no matter what with as few external dependencies as possible (ideally none beyond 'there is a network'). Single points of failure that can potentially take down much of our infrastructure give us heartburn. On top of this, diskless booting is not something that I believe is well supported by the majority of the OSes and Linux distributions that we use; we'd almost certainly be going off the beaten and fully supported path in terms of installation and system management (and might have to build some tools of our own).

In short: we'd save very little (or basically nothing) by using network booted diskless servers and get a whole bunch of problems to go with it. We'd need additional boot servers and relatively heavy duty fileservers to serve up the system filesystems and store 'local' state and we'd have non-standard system management that would be more difficult than we have today. Even if I felt enthused about this (which I don't) it would be a very hard sell to my co-workers; they would rationally ask 'what are we getting for all of this extra complexity and overhead?' and I would have no good answer.

(We don't install or reinstall systems anywhere near often enough that 'faster and easier installs' would be a good answer.)

Written on 10 November 2013.
« Google Feedfetcher is still fetching feeds and a User-Agent caution
Are those chassis fans actually still spinning? »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Nov 10 02:38:17 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.