An obvious reminder: disks can and do die abruptly

July 13, 2014

Modern disks have a fearsome array of monitoring features in the form of all of their SMART attributes, and hopefully you are running something that monitors them and alerts you to trouble. In an ideal world, disks would decay gradually and give you plenty of advance warning about an impending death, letting you make backups and prepare the replacement and so on. And sometimes this does happen (and you get warnings from your SMART monitoring software about 'impending failure, back up your data now').

Sometimes, though, it doesn't. As an illustration of this, a disk on my home machine just went from apparently fine to 'very slow IO' to SMART warnings about 8 unreadable sectors to very dead in the space of less than 24 hours. If I had acted very fast I might have been able to make a backup of it before it died, but only because I both noticed the suddenly slow system and was able to diagnose it. Otherwise, well, the time between getting the SMART warnings and the death was about half an hour.

As it happened I did not leap to get a backup of it right away because it's only one half of a mirror pair (I did make a backup once it had actively failed). The possibility of abrupt disk failure is one large reason that I personally insist on RAID protection for any data that I care about; there may not be enough time to save data off a dying disk and having to restore from backups is disruptive (and backups are almost always incomplete).

I'm sure that everyone who runs decent-sized amounts of disks is well aware of the possibility of abrupt disk death already, and certainly we've had it happen to us at work. But it never hurts to have a pointed reminder of it smack me in the forehead every so often, even if it's a bit annoying.

(The brave future of SSDs instead of spinning mechanical disks may generally do better than this, although we'll have to see. We have experienced some abrupt SSD deaths, although that was with moderately early hardware. It's possible that SSDs will turn out to mostly have really long service lifetimes, especially if they're not written to particularly heavily.)

Written on 13 July 2014.
« Early impressions of CentOS 7
Unmounting recoverable stale NFS mounts on Linux »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jul 13 22:37:52 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.