Wandering Thoughts archives

2019-08-16

A gotcha with Fedora 30's switch of Grub to BootLoaderSpec based configuration

I upgraded my office workstation from Fedora 29 to Fedora 30 yesterday. In the past, such upgrades been problem free, but this time around things went fairly badly, with the first and largest problem being that after the upgrade, booting any kernel gave me a brief burst of kernel messages, then a blank screen and after a few minutes a return to the BIOS and Grub main menu. To get my desktop to boot at all, I had to add 'nomodeset' to the kernel command line; among other consequences, this made my desktop a single display machine instead of a dual display one.

(It was remarkably disorienting to have my screen mirrored across both displays. I kept trying to change to the 'other' display and having things not work.)

The short version of the root cause is that my grub.cfg was rebuilt using outdated kernel command line arguments that came from /etc/default/grub, instead of the current command line arguments that had previously been used in my original grub.cfg. Because of how the Fedora 30 grub.cfg is implemented, these wrong command line arguments were then remarkably sticky and it wasn't clear how to change them.

In Fedora 29 and earlier, your grub.cfg is probably being maintained through grubby, Fedora's program for this. When grubby adds a menu entry for a new kernel, it more or less copies the kernel command line arguments from your current one. While there is a GRUB_CMDLINE_LINUX setting in /etc/default/grub, its contents are ignored until and unless you rebuild your grub.cfg from scratch, and there's nothing that tries to update it from what your current kernels in your current grub.cfg are actually using. This means that your /etc/default/grub version can wind up being very different from what you're currently using and actually need to make your kernels work.

One of the things that usually happens by default when you upgrade to Fedora 30 is that Fedora switches how grub.cfg is created and updated from the old way of doing it itself via grubby to using a Boot Loader Specification (BLS) based scheme; you can read about this switch in the Fedora wiki. This switch regenerates your grub.cfg using a shell script called (in Fedora) grub2-switch-to-blscfg, and this shell script of course uses /etc/default/grub's GRUB_CMDLINE_LINUX as the source of the kernel arguments.

(This is controlled by whether GRUB_ENABLE_BLSCFG is set to true or false in your /etc/default/grub. If it's not set at all, grub2-switch-to-blscfg adds a 'GRUB_ENABLE_BLSCFG=true' setting to /etc/default/grub for you, and of course goes on to regenerate your grub.cfg. grub2-switch-to-blscfg itself is run from the Fedora 30 grub2-tools RPM posttrans scriptlet if GRUB_ENABLE_BLSCFG is not already set to something in your /etc/default/grub.)

A regenerated grub.cfg has a default_kernelopts setting, and that looks like it should be what you want to change. However, it is not. The real kernel command line for normal BLS entries is actually in the Grub2 $kernelopts environment variable, which is loaded from the grubenv file, normally /boot/grub2/grubenv (which may be a symlink to /boot/efi/EFI/fedora/grubenv, even if you're not actually using EFI boot). The best way to change this is to use 'grub2-editenv - list' and 'grub2-editenv - set kernelopts="..."'. I assume that default_kernelopts is magically used by the blscfg Grub2 module if $kernelopts is unset, and possibly gets written back to grubenv by Grub2 in that case.

(You can check that your kernels are using $kernelopts by inspecting an entry in /boot/loader/entries and seeing that it has 'options $kernelopts' instead of anything else. You can manually change that for a specific entry if you want to.)

This is going to make it more interesting (by which I mean annoying) if and when I need to change my standard kernel options. I think I'm going to have to change all of /etc/default/grub, the kernelopts in grubenv, and the default_kernelopts in grub.cfg, just to be sure. If I was happy with the auto-generated grub.cfg, I could just change /etc/default/grub and force a regeneration, but I'm not and I have not yet worked out how to make its handling of the video modes and the menus agree with what I want (which is a basic text experience).

(While I was initially tempted to leave my system as a non-BLS system, I changed my mind because of long term issues. Fedora will probably drop support for grubby based setups sooner or later, so I might as well get on the BLS train now.)

To give credit where it's due, one (lucky) reason that I was able to eventually work out all of this is that I'd already heard about problems with the BLS transition in Fedora 30 in things like Fedora 30: When grub2-mkconfig Doesn’t Work, and My experiences upgrading to Fedora 30. Without that initial awareness of the existence of the BLS transition in Fedora 30 (and the problems it caused people), I might have been flailing around for even longer than I was.

PS: As a result of all of this, I've discovered that you no longer need to specify the root device in the kernel command line arguments. I assume the necessary information for that is in the dracut-built initramfs. As far as the blank screen and kernel panics go, I suspect that the cause is either or both of 'amdgpu.dpm=0' and 'logo.nologo', which were still present in the /etc/default/grub arguments but which I'd long since removed from my actual kernel command lines.

(I could conduct more experiments to try to find out which kernel argument is the fatal one, but my interest in more reboots is rather low.)

Update, August 21st: I needed to reboot my machine to apply a Fedora kernel update, so I did some experiments and the fatal kernel command line argument is amdgpu.dpm=0, which I needed when the machine was new but had turned off since then.

linux/Fedora30GrubBLSGotcha written at 20:58:09; Add Comment

Systemd and waiting until network interfaces or addresses are configured

One of the things that systemd is very down on is the idea of running services after 'the network is up', whatever that means; the systemd people have an entire web page on the subject. This is all well and good in theory, but in practice there are plenty of situations where I need to only start certain things after either a named network interface is present or an IP address exists. For a concrete example, you can't set up various pieces of policy based routing for an interface until the interface actually exists. If you're configuring this on boot in a systemd based system (especially one using networkd), you need some way to insure the ordering. Similarly, sometimes you need to listen only on some specific IP addresses and the software you're using doesn't have Linux specific hacks to do that when the IP address doesn't exist yet.

(As a grumpy sysadmin, I actually don't like the behavior of binding to an IP address that doesn't exist, because it means that daemons will start and run even if the system will never have the IP address. I would much rather delay daemon startup until the IP address exists.)

Systemd does not have direct native support for any of this, of course. There's no way to directly say that you depend on an interface or an IP address, and in general the dependency structure has long been under-documented. The closest you can get to waiting until a named network interface exists is to specify an After= and perhaps a Want= or a Requires= on the pseudo-unit for the network interface, 'sys-subsystem-net-devices-<iface>.device'. However, as I found out, the lack of a .device unit doesn't always mean that the interface doesn't exist.

You might think that in order to wait for an IP address to exist, you could specify an After= for the .device unit it's created in and by. However, this has historically had issues for me; under at least some versions of systemd, the .device unit would be created before the IP address was configured. In my particular situation, what worked at the time was to wait for a VLAN interface .device that was on top of the real interface that had the IP address (and yes, I mix tagged VLANs with an untagged network). By the time the VLAN .device existed, the IP address had relatively reliably been set up.

If you're using systemd-networkd and care about network interfaces, the easiest approach is probably to rely on systemd-networkd-wait-online.service; how it works and what it waits for is probably about as good as you can get. For IP addresses, as far as I know there's no native thing that specifically waits until some or all of your static IP addresses are present. Waiting for systemd-networkd-wait-online is probably going to be good enough for most circumstances, but if I needed better I would probably write a shell script (and a .service unit for it) that simply waited until the IP addresses I needed were present.

(I continue to think that it's a real pity that you can't configure networkd .network files to have 'network up' and 'network down' scripts, especially since their stuff for routing and policy based routing is really very verbose.)

PS: One of the unfortunate effects of the under-documented dependency structure and the lack of clarity of what to wait on is a certain amount of what I will call 'superstitious dependencies', things that you've put into your systemd units without fully understanding whether or not you needed them, and why (often also without fully documenting them). This is fine most of the time, but then one day an unnecessary dependency fails to start or perhaps exist and then you're unhappy. That's part of why I would like explicit and reliable ways to do all of this.

linux/SystemdNetworkThereIssue written at 00:26:26; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.