My views on various bits of disk drive technology today

August 19, 2013

trs80 asked (in a comment on this entry) about my views on various aspects of modern disk drives. This is actually an interesting subject for me because things are happening in the disk drive world that I hadn't been aware of until recently.

The big change that had passed me by entirely is the shift away from regular sized 3.5" hard drives to 2.5" (aka Small Form Factor) drives, what used to be 'laptop drives'. I'm used to thinking of these as things you wouldn't use outside of laptops but this is not at all the case any more. There are any number of 7200 RPM drives that will run happily in 24x7 setups, have long warranty periods, and apparently don't cost huge amounts of money over 3.5" drives of the same size. Using 2.5" drives is quite attractive in general because you get more drives into the same space and you can put SSDs in your drive bays without hassles (all SSDs that I've ever seen are 2.5" drives).

The bad news is that unfortunately we can't use them right now (although we'd like to). The largest 2.5" drives we've been able to spot are in the 1TB to 1.5TB range and this is just a bit too small for us. We're targeting 2TB drives in our hardware refresh as being a good balance between disk space and keeping our overall IOPS up (and also as being a decent size jump over our current set of drives).

This also means that we're targeting consumer level 7200 RPM SATA drives. I don't have any opinion on the relative merits of enterprise drives (whether 7200 RPM or faster) versus consumer drives because we've never had the money to buy anything but consumer drives. If money rained from the sky I think I'd still want to study the issue carefully because I'm far from convinced that 'enterprise' drives are worth the money, especially these days when you have much better options for high IO rates.

(After all a SSD will basically destroy even a 15k RPM hard drive. At that point you're down to 'higher reliability', if there is any.)

Money also dictates our limited use of SSDs. Today they are simply not price competitive for large storage; as trs80 mentioned they are something like 10x the per-GB cost of HDs (at least). They are excellent if you need the IO rates or if you have only modest storage needs (where the extra cost is small in absolute dollars), but this is not our space situation. We can only really afford to use SSDs as caches and write accelerators for regular (slow) hard drives. How much we'll be able to do this depends on how the budget comes out.

(We have already sprinkled SSDs over some important filesystems and we may do more of this, but we can't even vaguely afford to do it for everyone. And there is so far very little evidence that our users want and would be willing to pay for the drastically increased IO rates of an all-SSD ZFS pool.)

Comments on this page:

From at 2013-08-19 05:20:08:

As far as I can tell, the enterprise disks are the same with some various firmware bits twiddled. For example the "Quiet" vs "Fast" knob is set to one extreme by default for consumer disks vs enterprise disks. I think this can also lead to the drive is trying to be "quiet", and not move the heads so quickly which means it can be slow at retrieving data. This leads to the controller timing it out and kicking it from the array when it's a perfectly fine disk, just slow to respond. smartctl calls this "automatic acoustic management" and often you can configure it yourself.

I've also heard stories about enterprise drives having different timing characteristics based on retry behaviour. A consumer drive will retry several times, because it's likely to be the only copy of the data, and this can lead to very slow behaviour. The enterprise disks will retry a couple of times, then just report "nup, sorry." to the controller. The controller will kick it from the array, but will recover the data via whatever raid mechanism is necessary.

Obviously the two above behaviours happen if the controller timeouts are lax or strict respectively.

smartctl also has power management knobs, which I assume will be set differently. Do you expect your server drive to spin down? If so, will the controller kick it for a timeout when you try and access the data?

This is all hearsay, so take the advice with a pinch of salt. I've not done the experiments. But if you're thinking of doing them try twiddling these knobs.

-- PerryLorier
By cks at 2013-08-19 16:06:23:

We have seen long retry timeouts on disk errors in our specific environment but they so far haven't been a big problem. Generally when they start happening the disk is sick and we're going to want to replace it anyways.

We haven't seen (visible) problems with power saving modes or acoustic management stuff and so on. My impression is that decent 7200 RPM 'consumer' SATA drives don't have that sort of thing enabled on them; that's the sort of feature you find in 5400 RPM drives that are advertised as 'green', low-noise, or low-power.

Written on 19 August 2013.
« SSDs may make ZFS raidz viable for general use
The challenge for ARM servers, at least here »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Aug 19 00:51:19 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.