2010-05-09
Why diskless Unix machines lost out
A long time ago, diskless Unix machines were all the rage. These days, they've all but vanished (although they live on in some specialized applications, such as LTSP). From my perspective, what made diskless machines lose out is threefold: performance issues, cheap disks, and complexity. Of these, the real killer issue was complexity.
The hard core of the performance issues is not so much how fast a single machine could access its 'disks' (although this should not be underestimated itself) but how fast a whole bunch of them could do this all at once. Gigabit Ethernet may be as fast or faster than modern disks, but that's only if you're the only one trying to talk to your fileserver; have a few people trying to do that, and the fileserver's network becomes a bottleneck very fast even if its disks can keep up.
(You can argue that single client/single server numbers are now comparable for local disks and NFS filesystems. This may be true (I haven't measured), but it definitely wasn't in the past. For a long time, NFS was slower to significantly slower than local disks, especially for certain sorts of write operations.)
The hard core of the complexity of diskless machines is that Unix systems were never redesigned to allow you to really share all filesystems between machines, especially including the root filesystem. Instead, they always needed their own root filesystem and per-machine storage and some degree of per-machine administration to go with it. Once you have to have this storage and administration, where the actual storage lives is usually a secondary issue; you might as well put it where it is cheap and fast and common and does not cause you problems.
(There's nothing intrinsic in Unix that requires a per-machine root filesystem (eg, see here), but no one has ever seriously tried to build a general-purpose Unix that way.)
(This entry was inspired by reading Scalable day-to-day diskless booting, because I disagree strongly with their view that having the OS on local disks is incompatible with large scale administration. The truth is that building systems that use a single common system image is both hard and completely unsupported by current Unixes; you can probably make it work, but you'll be building it from scratch on your own. If you don't have a single system image, you need automation regardless of where your separate system images live and how much or little space they take up.)