2013-11-06
Why you might not want to use SSDs as system disks just yet
I wrote recently about our planned switch to using SSDs as system disks. This move may be somewhat more daring and risky than I've made it sound so far, so today I feel like running down all of the reasons I know that you might not want to do this (or at least do this just yet). Note that not all of this is deeply researched and in fact a bunch of it is taken from ambient stories and gossip that floats around.
The current state of hard drives is that they are mature technology. As mature technology the guts are basically the same between all hard drives, both different makes and different manufacturers, and the main 'enterprise' drive upsell is often about the firmware (and somewhat about how you connect to them). As such the consumer 7200 RPM SATA drive you can easily buy is mostly the same as an expensive 7200 RPM nearline SAS drive. Which is, of course, why so many people buy more or less consumer SATA drives for all sorts of production use.
My impression is that this is not the case for SSDs. Instead SSDs are a rapidly evolving immature technology, with all that that implies. SSDs are not homogenous; they vary significantly between manufacturers, product lines, and even product generations. Unlike hard drives you can't assume that any SSD you buy from a major player in the market will be decent or worth its price (but on the other hand it can sometimes be an underpriced gem). There are also real and significant differences between 'enterprise' SSDs and ordinary consumer SSDs; the two are not small variants of each other and ordinary consumer SSDs may not be up to actual production usage in server environments.
You can find plenty of horror stories about specific SSDs out there. You can also find more general horror stories about SSD behavior under exceptional conditions; one example I've seen recently is Understanding the Robustness of SSDs under Power Fault [PDF] (from FAST '13), which is about what it says. Let's scare everyone with a bit from the abstract:
Our experimental results reveal that thirteen out of the fifteen tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure.
Most of their SSDs were from several years ago so things may be better with current SSDs. Or maybe not. We don't necessarily know and that's part of the problem with SSDs. SSDs are very complex devices and vendors have every reason to gloss over inconvenient details and (allegedly) make devices that lie about things to you so that they look faster or healthier.
(It's widely reported that some SSDs simply ignore cache flush commands from the host instead of dutifully and slowly committing pending writes to the actual flash. And we're not talking about SSDs that have supercapacitors so that they can handle power failures.)
On a large scale level none of this is particularly surprising or novel (not even the bit about ignoring cache flushes). We saw the same things in the hard drive industry before it became a mature field, including manufacturers being 'good' or 'bad' and there being real differences between the technology of different manufacturers and between 'enterprise' and consumer drives. SSDs are just in the early stages of the same process that HDs went through in their time.
Ultimately that's the large scale reason to consider avoiding SSDs for casual use, such as for system drives. If you don't actively need them or really benefit from them, why take the risks that come from being a pioneer?
(This is the devil's advocate position and I'm not sure how much I agree with it. But I put the arguments for SSDs in the other entry.)
Modern versions of Unix are more adjustable than they used to be
One of the slow changes in modern Unix over the past ten to fifteen years has been a significant increase in modularity and with it how adjustable a number of core things are without major work. This has generally not been something that ordinary users notice because it happens at the level of system-wide configuration.
Undoubtedly this all sounds abstract, so let's get concrete. The
first example here is the relative pervasiveness of PAM. In
the pre-PAM world, implementing additional password strength checks
or special custom rules for who could su
to who took non-trivial
modifications to the source for passwd
and su
(or sudo
). In the
modern world both are simple PAM modules, as is things like taking
special custom actions when a password is changed.
My next example is nsswitch.conf
. There was a day in the history
of Unix when adding DNS lookups to programs required recompiling
them against a library with a special version of gethostbyname()
et al. These days, how any number of things get looked up is not
merely something that you can configure but something you can control;
if you want or need to, you can add a new sort of lookup yourself
as an aftermarket do it yourself thing. This can be exploited for
clever hacks that don't require
changing the system's programs in any particular way, just exploiting
how they work (although there are limits imposed by this approach).
(Actually now that I'm writing this entry I'm not sure that there have been any major moves in this sort of core modularity beyond NSS and PAM. Although there certainly are more options for things like your cron daemon and your syslog daemon if you feel like doing wholesale replacement of programs.)
One of the things that these changes do is they reduce the need for operating system source since they reduce your need for custom versions of operating system commands.
(Of course you can still wind up needing OS source in order to figure out how to write your PAM or NSS module.)
Sidebar: best practices have improved too
One of the practical increases in modularity has come from an increasing
number of programs (such as many versions of cron
) scanning
directories instead of just reading a file. As we learned starting
no later than BSD init versus System V init, a bunch of files in a
directory is often easier to manage than a monolithic single file
because you can have all sorts of people dropping files in and updating
their own files without colliding with each other. Things like Linux
package management have strongly encouraged this approach.