An annoying omission in the Solaris 8 DiskSuite toolset
We had one of our SAN RAID controllers die today (just the controller; the disks and the data were fine), and as a result I ran across an annoying omission in the DiskSuite toolset.
When the controller went kaboom, all of its logical drives stopped responding and DiskSuite marked all of the submirrors involved as needing maintenance. When the controller was replaced, all of the logical drives came back again, but the problem is that DiskSuite has no direct way to clear the 'needs maintenance' state on the affected submirrors.
For mirror devices with more than one submirror, the DiskSuite approach
is the same as the Linux approach: you remove the failed submirror and
then re-add it, with
metattach. I'd prefer a command
that just cleared the error status (and started the necessary resync),
but the whole process can be done while the filesystem is live and is
not too onerous.
(Since we had 28 mirrors with this problem, we wrote some stuff to automate it.)
The real fun and irritation comes in for mirrors that have only one submirror. To clear what is effectively a status flag, you must tear down the top-level mirror device and recreate it. Since this cannot be done with the mirror in use, you must unmount it before hand (and remount it afterwards). This is an especially irritating omission because DiskSuite itself is still perfectly happy to do IO to the nominally failed submirror, so it really is just a harmless status flag (unlike the multiple submirror case, where DiskSuite needs to actively do work to fix things up).
I can see leaving the status marker present until explicitly cleared, so you can scan a system and see which devices had problems and which didn't after an incident. But DiskSuite should provide you with a direct way to acknowledge and clear the warning flag, especially if it's going to be willing to do IO to the 'failed' device anyways.
Given that we could work around the issue, this may seem like a petty
complaint. But most of our Solaris servers have their root and
filesystems in DiskSuite mirrors, and it could be an interesting comedy
hour if we ever have a temporary controller glitch on those drives.