Try things out with new machines
I just gave someone at work this advice today, so I might as well repeat it here. We were bringing up a new machine, a nice dual processor server with a hardware RAID-1, and I suggested to him that before he even thought about putting it into production he take the opportunity to yank one of the drives and see what happened.
There are occasions where you would do this to a machine in production, but not very many. New hardware (or idle hardware) is about your only chance to experiment, to see what happens, what goes wrong, and how to fix it.
For example, with hardware RAID there's a collection of interesting questions:
- how does the machine react to a drive going missing?
- does your monitoring notice the problem? (You have monitoring, right?)
- how do you re-add a 'replacement' drive?
- does anything odd happen if you just plug the old 'dead' drive you pulled back in and don't do anything else?
I certainly don't want to be finding this sort of thing out on a machine that's in production. (The users will probably be very irate if I make a mistake in the RAID BIOS and eat the good disk.)
And if something goes wrong, new machines have expendable software; you
can always reinstall it, there's nothing very important on the disks. (I
just reinstalled my Solaris 9 test machine yesterday and today, because
the first time around the
/ filesystem was too small.)
In fact, reinstalling your new machine can often result in a cleaner configuration, because the second time around you know much more about setting the machine up and just what you want and need. (And you'll have stubbed your toe already.)
So take the opportunity to live excitingly. Yank the UPS's power cord (for extra fun, plug it back in at the last moment). Pull that RAID disk. Have an 'accident' that locks you out of the root account. It's fun and educational, and only occasionally horrifying.