Wandering Thoughts archives

2007-12-27

Why I am not entirely fond of Solaris 10 x86's boot archive

I am not a fan of initial ramdisks (initrds in Linux jargon), but ever since I discovered Solaris 10's version of the concept I've thought it was the most sensible approach to the whole issue. Just having all the drivers in the boot archive, rather than trying to pick out the ones the system thinks it will need, eliminates one entire set of annoying problems.

(The other Linux initrd problem is where the kernel version doesn't match the version of modules in the initrd; Solaris gets around it by not doing wholesale replacements of kernel modules when they update the kernel, presumably because of their stable kernel ABI.)

My problem with boot archives is that they are alarmingly fragile. I seem to have a small gift for crashing Solaris, and quite often when I manage this the system reboots complaining about an out of date boot archive and requires annoyingly tedious manual intervention to fix up. It seems that a great many things changing can make the boot archives out of date, for example adding and removing iSCSI disks.

(The list of files and directories that get copied into the boot archive is in /boot/solaris/filelist.ramdisk, but it is not obvious which of them gets updated by what.)

While the tediousness of fixing an out of date boot archive is bad enough, the real problem is that it means that Solaris 10 x86 machines will not reliably automatically reboot after unexpected events (eg, power failures). If we have done something that makes their boot archives 'out of date', they'll require manual intervention to do the rough equivalent of patting them on the head.

This would not be half as annoying if the boot archive was rebuilt periodically, but instead rebuilds seems to only be done when you take the machine down. You could have last made a change six months ago and still get hit by this because you haven't rebooted since. (Our fileservers, for example, are rebooted extremely infrequently.)

(The boot archive check is done by svc:/system/boot-archive, but I don't know what breaks if you bring a machine up without an up to date boot archive so I can't suggest just disabling it entirely.)

solaris/BootArchiveProblem written at 23:28:48; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.