Planning out a disk shuffle for my office workstation
As I mentioned yesterday, my office workstation lost one of its remaining HDs the other day. This HD is one side of a pair of 1 TB drives that's used for various mirrored partitions, so I haven't lost anything (unless the other side also fails before Monday, so let's hope not), but now I need to replace it with one of the ready spares I have sitting around for exactly this eventuality.
The obvious way to replace the failed disk is to do literally that; put in the new disk, partition it up to basically match the existing setup, and attach appropriate partitions to software RAID devices and my ZFS pool. That would certainly work, but it's not necessarily the right answer because my office workstation's current disk setup has mutated over time and the actual current setup of much of it is not what I would build from scratch.
Specifically, right now I have five drives with the following odd split up:
- two paired SSDs, used for
/and a SSD ZFS pool where my home directory and other important things live. (That my home directory and so on is on the SSDs is one reason I am not too worried about the current disk failure.)
- two paired 1 TB HDs that used to be my only
two drives. Because of that they still have partitions for my old
HD-based root filesystem (in two copies), the
/bootpartition I just recently migrated into the SSD
/, and finally my HD-based ZFS pool, which used to hold everything but now mostly holds less important data. Oh, and it turns out they also have my swap partition.
- One 500 GB HD that I used as overflow for unimportant virtual machines, back in the days when I thought I needed to conserve space on the mirrored 1 TB HDs (in fact it may date from the days when these were actually 750 GB HDs).
My replacements for the 1 TB drives are 1.5 TB, so this already gives me a space expansion, and there's now any number of things on those two 1 TB drives I don't need any more, and also one thing I would like to have. So my current plan is to replace both 1 TB drives (the dead and the still alive one) and set up the space on the new pair of 1.5 TB drives as follows:
- a small 1 GB swap partition, because it still seems like a good idea to give the Linux kernel some swap space to make it happy.
- a 200 GB or so ext4 partition, which will be used for an assortment
of things that aren't important enough to go in the SSD
/but that I don't want to have in ZFS, partly because they're things I may need to get back access to my ZFS pools if package upgrades go wrong.
- a 100 GB backup
/partition. As is my current habit, before major events like Fedora upgrades I'll copy the SSD
/to here so that I can kludge together a reversion to a good environment if something goes badly wrong.
- all the remaining space goes to my HD-based ZFS pool, which should expand the space a decent amount.
(All partitions and ZFS and so on will be mirrored between the two drives.)
Given the likely space expansion in my HD-based ZFS pool, I think I'll also get rid of that overflow 500 GB HD by folding its 120 GB or so of actual space usage into the main HD-based ZFS pool. It was always sort of a hack and a bit awkward (on top of having no redundancy). Plus this will simplify everything, and I can always put the drive (or a bigger replacement drive) back in and redo this if I turn out to actually need a bunch more space for virtual machines.
In a way, I'm grateful that my 1 TB HD finally died and also that it happened under the circumstances it did, where I couldn't immediately rush into replacing it in the most obvious way possible and had forced time to sit back and think about whether the obvious path was the right one. I'm probably going to wind up with a nicer, more sensibly set up system as a result of this disk failure, and I probably never would have done this rearrangement without being pushed.
Sometimes laziness doesn't pay off
My office workstation has been throwing out complaints about some of its disks for some time, which I have been quietly clearing up rather than replace the disks. This isn't because these are generally good disks; in fact they're some Seagate 1TB drives which we haven't had the best of luck with. I was just some combination of too lazy to tear my office machine apart to replace a drive and too parsimonious with hardware to replace a disk drive before it failed.
(Working in a place with essentially no hardware budget is a great way to pick up this reflex of hardware parsimony. Note that I do not necessarily claim that it's a good reflex, and in some ways it's a quite inefficient one.)
Recently things got a bit more extreme, when one disk went from nothing to suddenly reporting 92 new 'Offline uncorrectable sector' errors (along with 'Currently unreadable (pending) sectors', which seems to travel with offline uncorrectable sectors). I looked at this, thought that maybe this was a sign that I should replace the disk, but then decided to be lazy; rather than go through the hassle of a disk replacement, I cleared all the errors in the usual way. Sure, the disk was probably going to fail, but it's in a mirror and when it actually did fail I could swap it out right away.
(I actually have a pair of disks sitting on my desk just waiting to be swapped in in place of the current pair. I think I've had them set aside for this for about a year.)
Well, talking about that idea, let's go to Twitter:
@thatcks: I guess I really should have just replaced that drive in my office workstation when it reported 92 Offline uncorrectable sectors.
@thatcks: At 5:15pm, I'm just going to hope that the other side of the mirrored pair survives until Monday. (And insure I have pretty full backups.)
Yeah, I had kind of been assuming that the disk would fail at some convenient time, like during a workday when I wasn't doing anything important. There are probably worse times for my drive to fail than right in front of me at 5:15 pm immediately before a long weekend, especially when I have a bike ride that evening that I want to go to, but I can't think of many that are more annoying.
(The annoyance is in knowing that I could swap the drive on the spot, if I was willing to miss the bike ride. I picked the bike ride, and a long weekend is just short enough that I'm not going to come in in the middle of it to swap the drive.)
I have somewhat of a habit of being lazy about this sort of thing. Usually I get away with it, which of course only encourages me to keep on being lazy and do it more. Then some day things blow up in my face, because laziness doesn't always pay off. I need to be better about getting myself to do those annoying tasks sooner or later rather than putting them off until I have no choice about it.
(At the same time strategic laziness is important, so important that it can be called 'prioritization'. You usually can't possibly give everything complete attention, time, and resources, so you need to know when to cut some nominal corners. This usually shows up in security, because there are usually an infinite number of things that you could be doing to make your environment just a bit more secure. You have to stop somewhere.)