How to clear Solaris Volume Manager metadb replicas on Solaris 10 x86

September 25, 2007

It is possible to get a DiskSuite metadb replica into a sufficiently damaged state that it will panic the system in early boot, which is especially irritating when you aren't using it for anything. This can lead to you needing to clear metadb replicas.

(If you get past boot by turning off svc:/system/metainit the system panics when you run metadb or metainit, which is not too helpful for actually dealing with the problem.)

If you don't care about leaving the actual bits intact for potential analysis, I believe you can just dd /dev/zero over the appropriate slice. (Do not do this, however, if you have been tempted into using that conveniently spare slice 8 as a metadb replica.)

The less brutal way out is to boot into the rescue environment, edit your /kernel/drv/md.conf and /etc/lvm/mddb.cf to remove the slice (you must edit both), rebuild the boot archive with bootadm update-archive -R /a, and reboot.

(If you are masochistic you can go through the dance necessary to turn off the metainit service, bring the system up in single user, do this, and then turn metainit back on. But the rescue environment way is simpler.)

Disclaimer: recovering from dropping below metadb replica quorum is beyond the scope of this entry. Besides, I haven't had to do it yet.


Comments on this page:

From 24.150.20.189 at 2007-10-20 08:22:26:

Could you elaborate a bit on the state that leads to this level of corruption? We played around with a raid-1 set where we brought the machine up without the second disk connected. After solaris 10 update3*, the machine will drop into single user mode and allow you to remove the replicas that existed on the 'failed' disk, thus bringing you back to quorum in the eyes of disk suite...did you experience corruption in all of the metadb slices?

Prior to update3, there was a bug that prevented achieving a state where you could clean up the metadb's.

*I'm relying on memory that update3 is the magic number. It may have been fixed in u2.

-Ben

By cks at 2007-10-21 21:20:29:

The simple answer got long enough that I turned it into an entry of its own, CorruptingMetadb. Note that I will be really surprised if anyone can reproduce this, since I rather suspect that it takes a very special degree of corruption to do it.

Written on 25 September 2007.
« Assume the existence of folklore among your users
Thinking about why Apache waits for CGIs to close standard output »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Sep 25 22:16:07 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.