How I got a corrupted metadb replica that paniced Solaris 10 x86
Since I got asked this in a comment on my entry about clearing metadb
replicas, here is what I remember of how I managed to
get a metadb replica so corrupted that it paniced Solaris 10u3 x86.
- I wanted to experiment with metasets on my test machine, so I
needed a local metadb replica. Because I
didn't know about this I didn't have a spare partition, and because
I didn't know any better I put the local metadb replica in that
tempting slice 8.
(Since I was only really interested in metasets, I didn't do any
local DiskSuite stuff, although I did make and delete metasets
and so on.)
- sometime later I rebooted the system and it didn't even make it as
far as starting GRUB; I believe it gave some initial GRUB message
and then hung.
(I had been crashing the system repeatedly due to some interesting
tests so I did not think too much of this at
- I booted the machine with a Fedora Core 7 live CD and poked around,
verifying that the filesystems were still there.
- after a while I found the
installgrub command, booted the Solaris
install CD rescue environment, and ran it to get the machine back
to a bootable state. (I believe I may have also rebuilt the boot
archive at this point on general principle, since I was getting
used to it breaking if I sneezed on the system.)
- the test Solaris install would then boot but panic, which led me
to finding out how you boot Solaris 10 x86 in really single
- turning off the metainit service let the system boot, but the
moment I typed
metainit it would panic.
- because I was in a hurry and needed the system for other tests,
I ignorantly tried to recover the system by erasing the metadb
dd'ing zeroes all over slice 8. This destroyed the
system completely, since it wiped out the slice partitioning.
(If I had been really clever I would have saved a dd image of
slice 8 before doing this, but I was very irritated with Solaris
10u3 x86 at this point.)
On the whole it was a very educational experience and led me to look into a number of useful things so I would
be better prepared for a future emergency on any production machines
we wind up with.
I have one captured panic message from the system and the system disk
(which has more in syslog, and it would be possible to extract them if
I could reconstruct the necessary slice partitioning). I have since
tried a bit to reproduce this in a VMWare Solaris image but haven't been
successful, so it is not a simple and easy to reproduce issue.
(The Solaris 10u3 install I was using was current on all recommended
patches and on all released patches that applied to a number of areas
of interest to us, including ZFS, iSCSI, and DiskSuite.)