2024-04-29
Our likely long term future (not) with Ubuntu (as of early 2024)
Over on the Fediverse I said something that's probably not particularly surprising:
In re Canonical and Ubuntu: at work we are still using Ubuntu LTS (and we're going to start using 24.04), but this is on servers where we don't have to deal with snaps (we turn them off, they don't work in our environment). But the Canonical monetization drive is obvious and the end point is inevitable, so I expect we'll wind up on Debian before too many more years (depending on what Canonical does to LTS releases). 2026? 2028? Who knows.
wrt: <a post by @feoh>
(Work is a university department where we use physical servers in our own machine rooms and don't have the funding to pay for commercial support for anywhere near all of those servers.)
The 2026 and 2028 dates come from the expected next Ubuntu LTS release dates (which since 2008 have been every two years toward the end of April). It's always possible that Canonical could do something that unexpectedly forces us off Ubuntu LTS 22.04 and 24.04 before 2026 comes around and we have to make a decision again, but it seems somewhat unlikely (the obvious change would be to lock a lot of security updates behind 'Ubuntu Pro', effectively making the non-paid versions of Ubuntu LTS unsupported for most security fixes).
One potential and seemingly likely change that would force us to move away from Ubuntu would be Canonical changing important non-GUI packages to be Snaps instead of .debs that can be installed through apt (they've already moved important GUI packages to Snaps, but we are so far living without them). Snaps simply don't work in our environment and if Canonical forced us, we would rather move to Debian than to try to hack up Ubuntu and our NFS based environment to make them work (for the moment, until Canonical changes something that breaks our hacks). Another potential change that I keep expecting is for Canonical to more or less break the server installer in non-cloud environments, or to require them to provide emulations of cloud facilities (such as something to supply system metadata).
But in the long term I don't think the specific breaking changes are worth trying to predict. The general situation is that Canonical is a commercial company that is out to make money (lots of money), and free Ubuntu LTS for servers (or for anything) is a loss leader. The arc of loss leaders bends towards death, whether it be through obvious discontinuation, deliberate crippling, or simply slow strangulation from lack of resources. Sooner or later we'll have to move off Ubuntu; the only big questions are how soon and how much notice we'll have.
Should we jump before we have to? That may be a question we'll be asking ourselves in 2026, or maybe 2025 when the next Debian release will probably come out.
(Our answer for Ubuntu 24.04 LTS is that there's nothing in 24.04 so far that forces us to think about it so we're going to roll on with the default of continuing with Ubuntu LTS releases.)
2024-04-23
Libvirt's virt-viewer and (guest) screen blanking
One of the things that I sometimes need to do with my libvirt-based virtual machines is connect to their 'graphical' consoles. There are a variety of ways to do this, but generally the most convenient way for me has been virt-viewer, followed by virt-manager. Virt-viewer is typically pretty great, but it has one little drawback that surfaces with some of my VMs, especially the Fedora ones that boot into graphics mode. The standard behavior for Fedora machines sitting idle in graphics mode, especially on the login screen, is that after a while they'll blank the screen, which winds up turning off video output.
On a physical machine, the way to un-blank the display is to send it some keyboard or mouse input. Unfortunately, once this happens, all virt-viewer will do is display a 'Connected to graphics server' message in an otherwise inactive display window. Typing, clicking the mouse buttons, or moving the mouse does nothing in my environment; virt-viewer seems to be unwilling to send keyboard input to a virtual machine with a powered-down display.
(Virt-viewer has a menu that will let you send various keystrokes to the virtual machine, but this menu is greyed out when the virt-viewer sees the screen as blanked.)
My traditional way to fix this was to briefly bring up the virtual machine's console in virt-manager, which would reliably unblank it. With the console display now active, I could switch back to virt-viewer. Recently I discovered a better way. One of the things that virsh can do is directly send keystrokes to a virtual machine guest with 'virsh send-key'. Sending any keys to the guest will cause it to un-blank the screen, which is the result I want.
(The process for sending keys is a little bit arcane, you'll want to consult either virtkeyname-linux or maybe virtkeycode-linux, both of which are part of the libvirt manual pages.)
What key or keys you want to send are up to you. Right now I'm sending an ESC, which feels harmless and is easy to remember. If I was clever I'd write an 'unblank' script that just took the virtual machine's name and did all of the necessary magic for it.
(And someday hopefully this will all be unnecessary because virt-viewer will learn how to do this itself. Possibly I'm missing something in virt-viewer that would fix this, or something in libvirt machine configuration that would disable screen blanking.)
2024-04-22
Making virtual machine network interfaces inactive in Linux libvirt
Today, for reasons beyond the scope of this entry, I was interested in arranging to boot a libvirt-based virtual machine with a network interface that had no link signal, or at least lacked the virtual equivalent of it. It was not entirely obvious how to do this, and some of the ways I tried didn't work. So let's start with the easier thing to do, which is to set up a network interface that exists but doesn't talk to anything.
The easiest way I know of to do this is to create an 'isolated' libvirt network. An isolated libvirt network is essentially a virtual switch (technically a bridge) that is not connect to the outside world in any way. If your virtual machine's network interface is the only thing connected to this isolated network, it will have link signal but nothing out there to talk to. You can create such a network either through explicitly writing and loading the network XML yourself or through a GUI such as virt-manager (I recommend the GUI).
However, what I wanted was a network interface (a link) that was down, not up but connected to a non-functioning network. This is possible in several different ways through libvirt's various interfaces.
If a virtual machine is running, there are 'virsh
' commands that
will let you see the virtual machine's interfaces and manipulate
their state. 'virsh domiflist <domain>
' will give you the interface
names, then 'domif-getlink <domain> <interface>' will get its current
state and 'domif-setlink <domain> <interface> <state>' will change
it. If the virtual machine is not running, you'll need to get the
interface's MAC from 'domiflist', then use 'domif-setlink <domain>
<iface> <state> --config' to affect the link state when the virtual
machine starts up. However, you'll need to remember to later reset
things with 'domif-setlink ... up --config' to make the interface
be active on future boots.
If you like virt-manager's GUI (which I do), the easier approach for a powered down virtual machine is to go into its hardware list, pick the network device, and untick the 'Link state: active' tickbox (then Apply this change). You can then start the VM, which will come up with the interface behaving as if it had no network cable connected. Later you can tick the box again (and apply it) to reconnect the interface. The same thing can be done by editing the domain XML for the virtual machine to modify virtual link state. I believe this is what 'domif-setlink ... --config' does behind the scenes, although I haven't dumped the XML after such a change to see.
(In general there's a fair amount of interesting things lurking in
the virsh manual page.
For instance, until today I didn't know about 'virsh console
' to connect to the
serial console of a virtual machine.)
2024-04-19
Modern Linux mounts a lot of different types of virtual filesystems
For reasons that don't fit in the margins of this entry, I was recently looking at the filesystems that we have mounted on our Ubuntu machines and their types. Some of these filesystems are expected and predictable, such as the root filesystem (ext4), any NFS mounts we have on a particular machine (which can be of two different filesystem types), maybe a tmpfs mount with a size limit that we've set up, and ZFS filesystems on our fileservers. But a modern Linux system doesn't stop there, and in fact has a dizzying variety of virtual filesystems on various mount points, with various different virtual filesystem types.
As of Ubuntu 22.04 on a system that boots using UEFI (and runs some eBPF stuff), here is what you get:
sysfs /sys proc /proc devtmpfs /dev devpts /dev/pts tmpfs /run securityfs /sys/kernel/security tmpfs /dev/shm tmpfs /run/lock cgroup2 /sys/fs/cgroup pstore /sys/fs/pstore efivarfs /sys/firmware/efi/efivars bpf /sys/fs/bpf autofs /proc/sys/fs/binfmt_misc hugetlbfs /dev/hugepages mqueue /dev/mqueue debugfs /sys/kernel/debug tracefs /sys/kernel/tracing fusectl /sys/fs/fuse/connections configfs /sys/kernel/config ramfs /run/credentials/systemd-sysusers.service binfmt_misc /proc/sys/fs/binfmt_misc rpc_pipefs /run/rpc_pipefs tracefs /sys/kernel/debug/tracing tmpfs /run/user/<UID>
That amounts to 20 different virtual filesystem types, or 19 if you don't count systemd's autofs /proc/sys/fs/binfmt_misc mount.
On the one hand, I'm sure all of these different virtual filesystem types exist for good reason, and it makes the life of kernel code simpler to have so many different ones. On the other hand, it makes it more difficult for people who want to exclude all virtual filesystems and only list 'real' ones. For many virtual filesystems and their mounts, the third field of /proc/self/mounts (the type) is the same as the first field (the theoretical 'source' of the mount), but there are exceptions:
udev devtmpfs systemd-1 autofs none ramfs sunrpc rpc_pipefs
(On another system there is 'gvfsd-fuse' versus 'fuse.gvfsd-fuse' and 'portal' versus 'fuse.portal'.)
As a pragmatic matter, for things like our metrics system we're probably better off excluding by mount point; anything at or under /run, /sys, /proc, and /dev is most likely to be a virtual filesystem of some sort. Alternately, you can rely on things like metrics host agents to have a sensible default list, although if you have to modify that list yourself you're going to need to keep an eye on its defaults.
PS: technically this is a mix of true virtual filesystems, which materialize their files and directories on demand from other information, and 'virtual' filesystem types that are merely purely RAM-based but do store inodes, files, and directories that you create and manipulate normally. Since the latter are ephemeral, they usually get lumped together with the former, but there is a core difference. And a virtual filesystem isn't necessarily volatile; both 'efivarfs' and 'pstore' are non-volatile.
2024-04-17
Limiting the maximum size of Amanda debug logs with a Linux tmpfs mount
Recently we had a little incident with our Amanda backup system, where the Amanda daemon on both the backup server and one particular backup client got into a state where they kept running and, more importantly, kept writing out plaintive debugging log messages. We discovered this when first the Amanda server and then an important Amanda client entirely filled up their root filesystems with what was, by that time, a several hundred gigabyte debug log file, which they each wrote into their /var/log/amanda directory tree. Afterward, we wanted to limit the size of Amanda debugging logs so that they couldn't fill up the root filesystem any more, especially on Amanda clients (which our normal servers, especially our fileservers).
All of our root filesystems are ext4, which supports quotas for users, groups, and "projects", as sort of covered in the ext4 manual page. In theory we could have added a size limit on /var/log/amanda with project quotas. In practice this would have required updating the root filesystem's mount options in order to get it to take effect (and that means editing /etc/fstab too), plus we have no experience with ext4 quotas in general and especially with project quotas. Instead we realized that there was a simpler solution.
(We can't use user quotas on the user that Amanda runs as because Amanda also has to write and update various things outside of /var/log/amanda. We don't want those to be damaged if /var/log/amanda gets too big.)
The easiest way to get a small, size limited filesystem on Linux is with a tmpfs mount. Of course the contents of a tmpfs mount are ephemeral, but we almost never look at Amanda's debug logs and so we decided that it was okay to lose the past few days of them on a reboot or other event (Amanda defaults to only keeping four days of them). Better yet, with systemd you can add a tmpfs mount with a systemd unit and some systemd commands, without having to modify /etc/fstab in any way. Some quick checking showed that our /var/log/amanda directories were all normally quite small, with the largest ones being 25 Mbytes or so, so the extra memory needed for a tmpfs for them is fine.
Without comments, the resulting systemd var-log-amanda.mount file is:
[Unit] Description=Temporary Amanda directory /var/log/amanda # I am not 100% sure about this. It's copied from other # tmpfs mount units. DefaultDependencies=no Conflicts=umount.target Before=local-fs.target umount.target After=swap.target [Mount] What=tmpfs Where=/var/log/amanda Type=tmpfs Options=size=128m,nr_inodes=100000,\ nosuid,nodev,strictatime,\ mode=0770,uid=34,gid=34 [Install] RequiredBy=local-fs.target
(The UID and GID are those of the standard fixed Ubuntu 'backup' user and group. Possibly we can specify these by name instead; I haven't experimented to see if that's supported by mount and tmpfs. The real Options= line isn't split across multiple lines this way; I did it here to not break web page layout.)
In theory it would be better to use zram for this, since Amanda's debug logs are all text and should compress nicely. In practice, setting up a zram device and a filesystem on it and getting it all mounted has more moving parts than a tmpfs mount, which can be done as a single .mount systemd unit.
If we wanted persistence, another option could be a loopback device that used an appropriately sized file on the root filesystem as its backing store. I suspect that the actual mounting can be set up in a single systemd mount unit with appropriate options (since mount has options for setting up the loop device for you given the backing file).
2024-04-11
Getting the underlying disks of a Linux software RAID array
Due to the pre-beta Ubuntu 24.04 issue I found with grub updates on systems with software RAID root filesystems and BIOS MBR booting, for a while I thought we'd need something that rewrote a debconf key to change it from naming the software RAID of the root filesystem to naming the devices it was on. So I spent a bit of time working out how best to do that, which I'm going to write down for any future use.
At one level this question seems silly, because the devices are right there in /proc/mdstat (once we get which software RAID the root filesystem is mounted from). However, you have to parse them out and be careful to get it right, so we'd ideally like an easier way, which is to use lsblk:
# lsblk -n -p --list --output TYPE,NAME -s /dev/md0 raid1 /dev/md0 part /dev/sda2 disk /dev/sda part /dev/sdb2 disk /dev/sdb
We want the 'disk' type devices. Having the basic /dev names is
good enough for some purposes (for example, directly invoking
grub-install
), but we may want to use /dev/disk/by-id names in
things like debconf keys for greater stability if our system has
additional data disks and their 'sdX' names may get renumbered at
some point.
To get the by-id names, you have two options, depending on how old
your lsblk
is. Sufficiently recent versions of lsblk support an
'ID-LINK' field, so you can use it to directly get the name you
want (just add it as an output field in the lsblk invocation above).
Otherwise, the easiest way to do this is with udevadm:
udevadm info -q symlink /dev/sda | fmt -1 | sort
Since there are a bunch of /dev/disk/by-id names, you'll need to decide which one you pick and which ones you exclude. For our systems, it looks like we'd exclude 'wnn-' and 'nvme-eui.' names, probably exclude any 'scsi-' name that was all hex digits, and then take the alphabetically first option. Since lsblk's 'ID-LINK' field basically does this sort of thing for you, it's the better option if you can use it.
Going from a software RAID to the EFI System Partitions (ESPs) on its component disks is possible but harder (and you may need to do this if the relevant debconf settings have gotten scrambled). Given a disk, lsblk can report all of the components of it and what their partition type is:
# lsblk --list --output FSTYPE,PARTTYPE,NAME -n -p /dev/nvme0n1 ext4 /dev/md0 /dev/nvme0n1 vfat c12a7328-f81f-11d2-ba4b-00a0c93ec93b /dev/nvme0n1p1 linux_raid_member 0fc63daf-8483-4772-8e79-3d69d8477de4 /dev/nvme0n1p2
If a disk has an ESP, it will be a 'vfat' filesystem with the partition GUID shown here, which is the one assigned to indicate an ESP. In many Linux environments you can skip checking for the GUID and simply assume that any 'vfat' filesystem on your servers is there because it's the ESP. If you see this partition GUID but lsblk doesn't say that this is a vfat filesystem, what you have is a potential ESP that was set up during partitioning but then never formatted as a (vfat) filesystem. To do this completely properly you need to mount these filesystems to see if they have the right contents, but here we'd just assume that a vfat filesystem with the right partition GUID had been set up properly by the installer (or by whoever did the disk replacement).
(A partition GUID of '21686148-6449-6e6f-744e-656564454649' is a BIOS boot partition, which is often present on modern installs that use BIOS MBR booting.)
2024-04-10
It's far from clear how grub package updates work on Ubuntu
Recently I ran across (and eventually reported) an issue on pre-beta Ubuntu 24.04 where a grub package update would fail for systems with software RAID root disks and BIOS MBR booting. The specific error was that grub-install could not install the new version of GRUB's boot-time code on '/dev/md0', the (nominal) device of the root filesystem, reporting an error to the effect of:
grub-install: warning: File system `ext2' doesn't support embedding. grub-install: warning: Embedding is not possible. GRUB can only be installed in this setup by using blocklists. However, blocklists are UNRELIABLE and their use is discouraged.. grub-install: error: diskfilter writes are not supported. grub-install failure for /dev/md0
(You can work around this by reconfiguring the grub package to use
the underlying disk devices, either by doing 'dpkg-reconfigure
grub-pc
' or by installing package updates in a manner where dpkg
is allowed to ask you questions. Also, this is another case of
grub-install having unclear error messages.)
One of the puzzling things about this entire situation is that the exact same configuration works on Ubuntu 22.04 and there are no obvious differences between 22.04 and 24.04 here. For instance, there are debconf keys for what the root filesystem device is and they are exactly the same between 22.04 and 24.04:
; debconf-show grub-pc [...] * grub-pc/install_devices: /dev/disk/by-id/md-name-ubuntu-server:0
At this point you might guess (as I did) that 'grub-install /dev/md0' works on Ubuntu 22.04. However, it does not; it fails with the same error as in 24.04. So presumably how grub-install is invoked during package updates is different between 22.04 and 24.04.
As far as I can tell, the 'grub-pc' package runs grub-install from its 'postinst' script, which you can find in /var/lib/dpkg/info/grub-pc.postinst. If you take a look at this script, you can see that it's a rather complex script that is quite embedded into the general Debian package update and debconf framework. If there are ways to run it as a standalone script so that you can understand what it's doing, those ways aren't at all obvious. It's also not obvious how the script is making or not making decisions, and the 22.04 and 24.04 versions seem pretty similar. Nor does scanning and searching either version of the script provide any smoking guns in the form of, for example, mentions of 'md-'.
(You have to know a reasonable amount about dpkg to even find /var/lib/dpkg/info and know that the 'grub-pc.postinst' file is what you're looking for. The dpkg manual page does mention that packages can have various scripts associated with them.)
All of this adds up to something that's almost impossible for ordinary people to troubleshoot or debug. All we can readily determine is that this worked in Ubuntu 20.04 LTS and 22.04 LTS, and doesn't work in the pre-beta 24.04 (and probably not in the beta 24.04, and most likely not in the released 24.04). The mechanisms of it working and not working are opaque, buried inside several layers of black boxes.
Part of this opacity is that it's not even clear what Ubuntu's grub package does or is supposed to do on package update. If you run a UEFI system with mirrored system disks, for example, you may be a little bit surprised to find out that Ubuntu's grub is probably quietly updating all your EFI system partitions when it does package updates.
PS: after much delving into things using various tools and the fact that I have various scratch virtual machines available, I now believe that the answer is that Ubuntu 20.04 and 22.04 don't run grub-install at all when the grub package (for MBR booting) is updated. This fact is casually semi-disguised in the 20.04 and 22.04 grub-pc postinst script. Presumably the 20.04 and 22.04 server installer should have set 'grub-pc/install_devices' to a different value, but that problem was being covered up by grub-install normally not running and using that value.