Wandering Thoughts archives

2007-12-27

Why I am not entirely fond of Solaris 10 x86's boot archive

I am not a fan of initial ramdisks (initrds in Linux jargon), but ever since I discovered Solaris 10's version of the concept I've thought it was the most sensible approach to the whole issue. Just having all the drivers in the boot archive, rather than trying to pick out the ones the system thinks it will need, eliminates one entire set of annoying problems.

(The other Linux initrd problem is where the kernel version doesn't match the version of modules in the initrd; Solaris gets around it by not doing wholesale replacements of kernel modules when they update the kernel, presumably because of their stable kernel ABI.)

My problem with boot archives is that they are alarmingly fragile. I seem to have a small gift for crashing Solaris, and quite often when I manage this the system reboots complaining about an out of date boot archive and requires annoyingly tedious manual intervention to fix up. It seems that a great many things changing can make the boot archives out of date, for example adding and removing iSCSI disks.

(The list of files and directories that get copied into the boot archive is in /boot/solaris/filelist.ramdisk, but it is not obvious which of them gets updated by what.)

While the tediousness of fixing an out of date boot archive is bad enough, the real problem is that it means that Solaris 10 x86 machines will not reliably automatically reboot after unexpected events (eg, power failures). If we have done something that makes their boot archives 'out of date', they'll require manual intervention to do the rough equivalent of patting them on the head.

This would not be half as annoying if the boot archive was rebuilt periodically, but instead rebuilds seems to only be done when you take the machine down. You could have last made a change six months ago and still get hit by this because you haven't rebooted since. (Our fileservers, for example, are rebooted extremely infrequently.)

(The boot archive check is done by svc:/system/boot-archive, but I don't know what breaks if you bring a machine up without an up to date boot archive so I can't suggest just disabling it entirely.)

BootArchiveProblem written at 23:28:48; Add Comment

2007-12-21

A thought about Solaris 10 x86's boot process

In succinct form:

The problem with Solaris 10 x86's boot process is not so much that it is trying to pretend that it has OpenBoot, the problem is that it is doing such a clumsy and awkward job of faking it.

I will admit that I am not a fan of the Solaris 10 x86 boot process. I understand why it is trying very hard to pretend that it is just like SPARC hardware with OpenBoot, and I can even sympathize with Sun's motivations for this. But the contortions that Sun forces the natural x86 boot process into to do this are both painful and unnecessary.

(Yes, they are painful; also inconvenient, complex, and so on. Modifying GRUB menu entries on the fly is not a great way to do anything, plus it is nothing like an actual OpenBoot environment.)

The contortions are unnecessary because Sun is already putting a program between GRUB and the actual Solaris kernel, namely /platform/i86pc/multiboot, which even has the job of faking various bits of OpenBoot like EEPROM parameters. This makes it the perfect place to go the rest of the way and just build an OpenBoot like command line environment that would let the user interrupt the boot process to supply kernel boot arguments and so on. (GRUB would be demoted to a small implementation detail and the way to get to the built in emergency rescue environment.)

Since this is such an obvious idea, it must already have occurred to Sun. Presumably there is some good reason that they have not already done this.

(Looking at the OpenSolaris code online suggests that multiboot is already having to deal with PC hardware issues, including the console, and is not just purely playing around with stuff already in memory.)

Solaris10X86BootThought written at 23:57:47; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.