Why diskless Unix machines lost out

May 9, 2010

A long time ago, diskless Unix machines were all the rage. These days, they've all but vanished (although they live on in some specialized applications, such as LTSP). From my perspective, what made diskless machines lose out is threefold: performance issues, cheap disks, and complexity. Of these, the real killer issue was complexity.

The hard core of the performance issues is not so much how fast a single machine could access its 'disks' (although this should not be underestimated itself) but how fast a whole bunch of them could do this all at once. Gigabit Ethernet may be as fast or faster than modern disks, but that's only if you're the only one trying to talk to your fileserver; have a few people trying to do that, and the fileserver's network becomes a bottleneck very fast even if its disks can keep up.

(You can argue that single client/single server numbers are now comparable for local disks and NFS filesystems. This may be true (I haven't measured), but it definitely wasn't in the past. For a long time, NFS was slower to significantly slower than local disks, especially for certain sorts of write operations.)

The hard core of the complexity of diskless machines is that Unix systems were never redesigned to allow you to really share all filesystems between machines, especially including the root filesystem. Instead, they always needed their own root filesystem and per-machine storage and some degree of per-machine administration to go with it. Once you have to have this storage and administration, where the actual storage lives is usually a secondary issue; you might as well put it where it is cheap and fast and common and does not cause you problems.

(There's nothing intrinsic in Unix that requires a per-machine root filesystem (eg, see here), but no one has ever seriously tried to build a general-purpose Unix that way.)

(This entry was inspired by reading Scalable day-to-day diskless booting, because I disagree strongly with their view that having the OS on local disks is incompatible with large scale administration. The truth is that building systems that use a single common system image is both hard and completely unsupported by current Unixes; you can probably make it work, but you'll be building it from scratch on your own. If you don't have a single system image, you need automation regardless of where your separate system images live and how much or little space they take up.)


Comments on this page:

From 70.130.186.117 at 2010-06-02 09:04:56:

We have had good luck booting multiple FreeBSD and Fedora systems from a NetApp fileserver, and performance is not a problem for us. We do have a /tmp and swap on each machine. I have a complete description of the FreeBSD implementation at http://www.nber.org/sys-admin/FreeBSD-diskless.html - it isn't hard at all. Fedora is a bit more difficult and we haven't done documentation for it yet.

I would say that it is reasonable to do what you know. If you haven't ever done a diskless boot and have lots of experience with imaging, there isn't a necessity to change. But similarly, if you know how to diskless boot, or have clear docs, I don't se any serious problems. The NetApp can certainly handle a few score of machines without serious boot delays, and during operation most I/O is for user files (at least on our systems) so diskless versus not will have no effect.

Daniel Feenberg NBER feenberg@nber.org

Written on 09 May 2010.
« SSL certificate vendors are selling a commodity
A little vi twitch »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sun May 9 01:56:06 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.