Wandering Thoughts archives

2014-07-13

An obvious reminder: disks can and do die abruptly

Modern disks have a fearsome array of monitoring features in the form of all of their SMART attributes, and hopefully you are running something that monitors them and alerts you to trouble. In an ideal world, disks would decay gradually and give you plenty of advance warning about an impending death, letting you make backups and prepare the replacement and so on. And sometimes this does happen (and you get warnings from your SMART monitoring software about 'impending failure, back up your data now').

Sometimes, though, it doesn't. As an illustration of this, a disk on my home machine just went from apparently fine to 'very slow IO' to SMART warnings about 8 unreadable sectors to very dead in the space of less than 24 hours. If I had acted very fast I might have been able to make a backup of it before it died, but only because I both noticed the suddenly slow system and was able to diagnose it. Otherwise, well, the time between getting the SMART warnings and the death was about half an hour.

As it happened I did not leap to get a backup of it right away because it's only one half of a mirror pair (I did make a backup once it had actively failed). The possibility of abrupt disk failure is one large reason that I personally insist on RAID protection for any data that I care about; there may not be enough time to save data off a dying disk and having to restore from backups is disruptive (and backups are almost always incomplete).

I'm sure that everyone who runs decent-sized amounts of disks is well aware of the possibility of abrupt disk death already, and certainly we've had it happen to us at work. But it never hurts to have a pointed reminder of it smack me in the forehead every so often, even if it's a bit annoying.

(The brave future of SSDs instead of spinning mechanical disks may generally do better than this, although we'll have to see. We have experienced some abrupt SSD deaths, although that was with moderately early hardware. It's possible that SSDs will turn out to mostly have really long service lifetimes, especially if they're not written to particularly heavily.)

tech/AbruptDiskDeath written at 22:37:52; Add Comment

Early impressions of CentOS 7

For reasons involving us being unimpressed with Ubuntu 14.04, we're building our second generation iSCSI backends on top of CentOS 7 (basically because it just came out in time). We have recently put the first couple of them into production so now seems a good time to report my early impressions of CentOS 7.

I'll start with the installation, which has impressed me in two different ways. The first is that it does RAID setup the right way: you define filesystems (or swap areas), tell the installer that you want them to be RAID-1, and it magically figures everything out and does it right. The second is that it is the first installer I've ever used that can reliably and cleanly reinstall itself over an already-installed system (and it's even easy to tell it how to do this). You would think that this would be trivial, but I've seen any number of installers explode; a common failure point in Linux installers is assembling existing RAID arrays on the disks then failing to completely disassemble them before it tries to repartition the disks. CentOS 7 has no problems, which is something that I really appreciate.

(Some installers are so bad that one set of build instructions I wrote recently started out with 'if these disks have been used before, completely blank them out with dd beforehand using a live CD'.)

Some people will react badly to the installer being a graphical one and also perhaps somewhat confusing. I find it okay but I don't think it's perfect. It is kind of nice to be able to do steps in basically whatever order works for you instead of being forced into a linear order, but on the other hand it's possible to overlook some things.

After installation, everything has been trouble free so far. While I think CentOS 7 still uses NetworkManager it does it far better than how Red Hat Enterprise 6 did; in other words the networking works and I don't particularly notice that it's using NetworkManager behind the scenes. We can (and do) set things up in /etc/sysconfig/network-scripts in the traditional manner. CentOS 7 defaults to 'consistent network device naming' but unlike Ubuntu 14.04 it works and the names are generally sane. On our hardware we get Ethernet device names of enp1s0f0, enp1s0f1, and enp7s0; the first two are the onboard 10G-T ports and the third is the add-on 1G card. We can live with that.

(The specific naming scheme that CentOS 7 normally uses is described in the Red Hat documentation here, which I am sad to note needs JavaScript to really see anything.)

CentOS 7 uses systemd and has mostly converted things away from /etc/init.d startup scripts. Some people may have an explosive reaction to this shift but I don't; I've been using systemd on my Fedora systems for some time and I actually like it and think it's a pretty good init system (see also the second sidebar here). Everything seems to work in the usual systemd way and I didn't have any particular problems adding, eg, a serial getty. I did quite appreciate that systemd automatically activated a serial getty based on a serial console being configured in the kernel command line.

Overall I guess the good news is that I don't have anything much to say because stuff just works and I haven't run into any unpleasant surprises. The one thing that stands out is how nice the installer is.

linux/CentOS7EarlyImpressions written at 01:00:20; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.