Wandering Thoughts archives

2019-09-12

The mystery of why my Fedora 30 office workstation was booting fine

The other day, I upgraded the kernel on my office workstation, much as I have any number of times before, and rebooted. Things did not go well:

So the latest Fedora 30 updates (including a kernel update) build an initramfs that refuses to bring up software RAID devices, including the one that my root filesystem is on. Things do not go well afterwards.

Then I said:

Fedora's systemd, Dracut and kernel parameters setup have now silently changed to require either rd.md.uuid for your root filesystem or rd.auto. The same kernel command line booted previous kernels with previous initramfs's.

The first part of this is wrong, and that leads to the mystery.

In Fedora 29, my kernel command line was specifying both the root filesystem device by name ('root=/dev/md20') and the software RAID arrays for the initramfs to bring up (as 'rd.md.uuid=...'). When I upgraded to Fedora 30 in mid-August, various things happened and I wound up removing both of those from the kernel command line, specifying the root filesystem device only by UUID ('root=UUID=...'). This kernel command line booted a series of Fedora 30 kernels, most recently 5.2.11 on September 4th, right up until yesterday.

However, it shouldn't have. As the dracut.cmdline manpage says, the default since Dracut 024 has been to not auto-assemble software RAID arrays in the absence of either rd.auto or rd.md.uuid. And the initramfs for older kernels (at least 5.2.11) was theoretically enforcing that; the journal for that September 4th boot contains a report of:

dracut-pre-trigger[492]: rd.md=0: removing MD RAID activation

But then a few lines later, md/raid1:md20 is activated:

kernel: md/raid1:md20: active with 2 out of 2 mirrors

(The boot log for the new kernel for a failed boot also had the dracut-pre-trigger line, but obviously no mention of the RAID being activated.)

I unpacked the initramfs for both kernels and as far as I can tell they're identical in terms of the kernel modules included and the configuration files and scripts (there are differences in some binaries, which is expected since systemd and some other things got upgraded between September 4th and now). Nor has the kernel configuration changed between the two kernels according to the config-* files in /boot.

So by all evidence, the old kernel and initramfs should not auto-assemble my root filesystem's software RAID and thus shouldn't boot. But, they do. In fact they did yesterday, because when the new kernel failed to boot the first thing I did was boot with the old one. I just don't know why, and that's the mystery.

My fix for my boot issue is straightforward; I've updated my kernel command line to have the 'rd.md.uuid=...' that it should have had all along. This works fine.

(My initial recovery from the boot failure was to use 'rd.auto', but I've decided that I don't want to auto-assemble anything and everything that the initramfs needs. I'll have the initramfs only assemble the bare minimum, just in case. While my swap is also on software RAID, I specifically decided to not assemble it in the initramfs; I don't really need it until later.)

linux/Fedora30BootMystery written at 23:02:06; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.