Wandering Thoughts archives

2008-07-13

How ZFS helps out with the big RAID-5 problem

It's time for me to say something nice about ZFS for a change, because ZFS can make the big RAID-5 problem significantly less of a problem for many people. ZFS offers two significant advantages:

  • because it knows what parts of the array actually contain live data, it doesn't need to read all of the disks. Less data read means less chance of an unrecoverable read error.

    (How much of an improvement this is depends on how full your pools are; if you routinely run with very full pools, you are reading most of your disks anyways.)

  • ZFS has mechanisms for identifying and tracking damaged files, so even if you hit an unrecoverable read error you will not loose the entire pool, just the affected file(s). Since ZFS defaults to making multiple copies of filesystem metadata (even in raidz pools), you may not even lose anything if you are lucky enough to have the UER hit a directory or the like, instead of an actual file.

(One reason that many RAID-5 implementations give up and declare the entire array dead if they hit a UER during array reconstruction is that they have no mechanisms for recording that part of the array is damaged; either they pretend that the array is entirely healthy or they kill it entirely, and they opt for the latter for 'safety'. As the chance for a UER during reconstruction rises, this may change.)

I think that the ZFS people would still strongly suggest that you limit your raidz pool sizes, use raidz2, or both, but at least ZFS gives you better odds if you have to run with raidz instead of raidz2.

(As an aside, it is worth noting that this is one place where RAID-6 is clearly better than RAID-5 plus a hot spare for the same number of disks, as covered in the last entry.)

ZFSAndBigRaidProblem written at 23:29:44; Add Comment

2008-07-08

How to force Solaris to renumber network devices

Let us suppose, as a not entirely hypothetical example, that you are transplanting the system disks from a Solaris 10 install from a SunFire X2100 to a SunFire X2200. While both machines have four onboard network ports, two Broadcom and two nVidia, they are not quite the same hardware, enough so that if left to its own devices Solaris will consider the new machine's NICs to be bge2, bge3, nge2, and nge3 (instead of bge0, bge1, nge0, and nge1).

This isn't a really desirable result, because it makes the transplanted machines different from any future machines you install on the X2200s (and in turn means that you have to remember any particular install's history when figuring out which network device is which). What you want to do is to force Solaris to renumber the Ethernet devices from scratch, giving them their natural numbers.

(As far as I can see, there is no documented way of doing this; a reconfiguration reboot doesn't do it, at least on Solaris 10 U4 and U5.)

It turns out that Solaris stores this mapping information in /etc/path_to_inst in the relatively obvious form of:

"PATH" INSTANCE "DRIVER"

The PATH is relative to /devices, the DRIVER is what you'd expect, and the INSTANCE, for network devices, is the N in bgeN. So what you want to do is edit /etc/path_to_inst to remove all mention of your network devices and do a reconfiguration reboot, which will recreate them from scratch and should give them the numbers you want. It's possible that you can just directly assign instance numbers to network devices without the reconfiguration reboot, but I haven't tested this.

(Important: make a backup of the file first, just in case.)

When I did this it took two go-arounds to get the nVidia devices correct; you might be able to get it down to one by doing the editing in the rescue environment instead of booting the system into normal single-user mode. If you go this route, you'll need to rebuild the boot archive by hand (with 'bootadm update-archive -R /a').

Oh yes, and here's an important safety tip that I learned the hard way: under no circumstances should you use rem_drv (what can I say, it looked like a tempting way to force a full reconfiguration of the driver from scratch). Doing so removes the information about which PCI devices are handled by the driver, which is hard to recover from unless you have a spare Solaris 10 machine (ideally with the same hardware) around to consult as reference.

(This mapping information is stored in /etc/driver_aliases; the base version of driver_aliases comes from the SUNWcsd package but it's then modified by the install scripts of various driver packages.)

RenumberingNetworkDevices written at 00:09:30; Add Comment

By day for July 2008: 8 13; before July; after July.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.