Wandering Thoughts

2022-09-28

How I've set up my libvirt based virtual machines (in late 2022)

I moved from VMWare Workstation to using Linux's libvirt and its native virtualization facilities earlier this year, and I've been happy with that move although I still would like to be able to take good snapshots of UEFI based virtual machines. Over time I've wound up with a setup that I'm happy with for the work that I do, one that's similar but not quite the same as my VMWare setup.

I have two groups of VMs. One group is Fedora VMs (and one Windows VM) that I use for testing Fedora upgrades, Windows things, and so on. All of these machines are NAT'd through a somewhat customized NAT setup, and I basically just run them. I make little to no usage of VM snapshots; about my only use is to snapshot them before I do something I consider unusually risky (or that I may want to re-try), and then generally to delete the snapshot later after it worked.

The second group of VMs is VMs used to test various things in our Ubuntu server environment. Our environment expects machines to have real IPs, so all of these vms use 'macvtap' bridged networked (on a second, completely unused port) and have their own IPs. Our standard Ubuntu install setup has a two stage install process, where we first install from ISO image (which sets the machine's IP address, among other things) and then run through a large postinstall step to customize machines. With most of the testing I do, I want to start from scratch in a fresh install (which most closely mimics real servers) rather than try to shuffle around software and setups on an already installed machine.

To make this more convenient, I've made snapshots for each machine with one or more Ubuntu versions installed from our ISO image (with the machine's IP address set) but no further post-install setup done. At the very start of a VM's life, I've also made an initial snapshot of it with an empty disk. These snapshots are given imaginative yet obvious names like 'empty-initial' and '2204-initial', and when I want to use a vm for something based on 22.04 or 20.04 or whatever, I pick one that's not currently in use and do 'virsh snapshot-reset <what> 2204-initial; virsh start <what>' and proceed from there. Sometimes, if I'm going to be working with a particular setup of a machine I will also create a snapshot for it in that state and then potentially keep returning to it, but that's more chancy because of configuration and package version drift.

(VMWare Workstation visualized the equivalent snapshots as a tree. I don't know if the underlying QEMU disk snapshots or the libvirt versions have any sort of tree to them, but I don't miss the tree view; it's generally straightforward enough to keep track of them from their names.)

I liked libvirt well enough to set up a general virtualization server for tests, scratch machines, and so on, which also used macvtap bridged networking on a host port that's dedicated to this. The virtual machines on this server follow the same general pattern, although this time I set up enough VMs that each VM can be more or less dedicated to a specific Ubuntu version instead of switching back and forth between different ones. At the moment, some of these VMs have specific ongoing testing purposes, but in general what's on them is supposed to be completely expendable. In their role for ongoing testing, some of these machines are normally running all the time. On my desktop I don't normally leave VMs running, although sometimes I lose track of that and don't notice one for a while.

(I should make snapshots of the current state of these ongoing test VMs just in case.)

Although libvirt can (I believe) snapshot a live and running machine, I always make snapshots with the machine off. Partly this is habit from my days of using VMWare Workstation, but partly it's because our server setups don't expect to be (effectively) suspended and then woken again some time later. It seems more reliable to shut them down then have them boot up (much) later, which are things we expect to have happen and so design our scripts and so on for.

(Although PXE based network installs might simplify some of this, we don't use them and they would still take more time than starting from a partially prepared image.)

LibvirtMySetup2022 written at 21:52:16; Add Comment

2022-09-26

The lsb_release program and the /etc/os-release file

Every so often I want to know what Ubuntu release a particular machine is running, and when I do I've become accustomed to using 'lsb_release -r'' to get this information. Sometimes I also want to remember which version of Fedora I'm currently running on my desktops (it changes much more often); depending on my memory at the moment I look at /etc/fedora-release, or /etc/redhat-release, or remember that I actually installed lsb_release on my Fedora machines and use it. Today, I found myself wondering if there was a central file that lsb_release consulted on everything so I could use it and cut out the middleman. The answer turns out to be no and yes, in the traditional Linux way.

(Fedora also has /etc/system-release.)

The 'no' part is that the implementation of lsb_release isn't standardized; in fact, Fedora and Ubuntu have two entirely different versions. Fedora's version is a shell script originally from the 'Free Standards Group' that has been made to check /etc/redhat-release first. Ubuntu's version is a Python program that seems to ultimately wind up reading /usr/lib/os-release, supplemented by interesting information that's found in /usr/share/distro-info. The Ubuntu lsb_release seems to have originally been written for Debian, but I don't know its history beyond that. While the arguments and behavior of lsb_release may be standardized, that's the limit.

As part of this, I also wondered where the Prometheus host agent got its information on OS versions from. The answer turns out to be /usr/lib/os-release (or /etc/os-release, which is a symlink to the /usr/lib version). The os-release file itself is a systemd innovation (cf) which was created to unify the morass of different implementations that Linux distributions used here prior to it. Of course, Linux distributions such as Fedora have carried on supporting their existing files too, because there's probably still code and people out there who look at them. At least Ubuntu's lsb_release now uses os-release and so in theory you could use its code on more or less any systemd-using Linux distribution.

Unfortunately, /etc/os-release is a bit too large and verbose to just dump out, so I expect I'll just keep using 'lsb_release -r' for casual situations. If I have programs that need to know, though, they can read it directly if they want to. However, in shell scripts I think that using lsb_release is probably simpler (and it's also what our scripts currently do in the rare case when they need to know the Ubuntu version they're running on).

LsbReleaseAndOsRelease written at 22:59:18; Add Comment

2022-09-24

Needing xdg-desktop-portal may be in my future (even without Wayland)

I generally haven't had particularly positive experiences with xdg-desktop-portal on my custom desktop. At best it appears to do nothing; at worst, it's been a core element in mysterious problems. Having xdg-desktop-portal installed effectively isn't optional these days on Fedora (if you try to deinstall it, it takes out a bunch of other things, although in Fedora 36 there's fewer things than I expected). However, letting the program and its whole portal environment do anything so far seems to be optional unless you use flatpaks.

(To my surprise, our Ubuntu 22.04 systems don't seem to have xdg-desktop-portal installed at all, despite having a bunch of graphical things. The clue may be in what dnf lists as depending on xdg-desktop-portal on my Fedora 36 system, which is mostly gdm, gnome-shell, and gnome session things.)

Recently I read Artemis Everfree's An X11 Apologist Tries Wayland (via). One of the things I learned from the article is that on Wayland, a variety of ordinary programs (running outside flatpaks) rely on xdg-desktop-portal and its friends to do a variety of desktop operations; screen sharing is the example mentioned in the article. This makes a lot of sense, since in Wayland this sort of thing requires additional, non-standard interaction with whatever Wayland compositor you're using, and as I understand it GNOME and KDE have gone their own way on compositors.

One of my personal rules of thumb for open source is that open source programs tend to converge on having only one way of doing annoying things. Following this, if you have to support xdg-desktop-portal's DBus based stuff to do things on Wayland anyway, it's probably going to be tempting to only do these things through x-d-p even when running on X11 (provided that x-d-p supports them on X11, and that the environments you want to run in have a recent enough version of x-d-p). As a result, I suspect that sooner or later, getting a functional x-d-p environment may be something I need even on X11. Hopefully by that time there will be a 'generic X11 window manager environment' backend of xdg-desktop-portal (which would be sort of analogous to the wlroots one for Wayland).

PS: of course another option is that programs migrate to requiring x-d-p and no one writes a generic X11 x-d-p backend, because no one cares enough and everyone is theoretically moving to Wayland anyway (the practical side is another issue). But I can hope that one doesn't happen.

XdgDesktopPortalMayBeInMyFuture written at 21:58:02; Add Comment

2022-09-18

I believe SELinux needs active support from your distribution

We have a single machine that uses SELinux, because it has a need for an unusually thorough level of security. This machine runs CentOS 7, because at the time we built this machine (several years ago), CentOS 7 was the obvious long term support Linux to use to get a high security, SELinux based environment. Since CentOS has effectively imploded, we are going to need to replace that machine with some other distribution before the middle of 2024, and the default choice is Ubuntu.

If we're going to build a new Ubuntu based machine for this role, one question is whether or not we want to try using SELinux. I've been thinking about this, and my answer so far is no. I don't think we want to try to make SELinux work on Ubuntu. If you do a little bit of Internet searching, there are obvious warning signs, such as the minimal state of the Ubuntu wiki SELinux page and its warnings (and relatedly the various cautions on the Debian pages). However, I feel there is a more general reason, which is that in practice, SELinux needs active support from your distribution and Canonical is not interested in doing this because they expect you to use AppArmor (in fact they try to make you use it).

My strong impression is that the real work of SELinux is in SELinux policies and related to them, all of the labeling that you need to make policies work. These policies and labeling interact with choices that distributions make, such as what programs are called (is it 'exim' or 'exim4') and where programs put and expect their files. As a result, SELinux requires a certain amount of distribution specific work and development, and if a distribution doesn't invest in that, you'll have issues using SELinux on it as SELinux either blocks things that you want to happen or allows things to happen that you don't want to (if the distribution sets very broad policies and labels in order to just get stuff going).

Red Hat and thus CentOS was (and is, as far as I know) quite committed to SELinux. I'm not sure if there are any other Linux distributions that are, especially distributions with releases that get long term support. Debian's wiki pages suggest that it's not one of them.

(I expect that AppArmor can be used to meet our needs, once we re-analyze them for a replacement system. We've long since lost our knowledge of how exactly SELinux is set up on that system and what our security goals are, since we touch it as little as possible.)

SELinuxNeedsDistroBuyin written at 22:41:54; Add Comment

2022-09-12

What's lost when running the Prometheus host agent as a non-root user on Linux

If you start up the Prometheus host agent as root, it will nag at you about this:

caller=node_exporter.go:185 level=warn msg="Node Exporter is running as root user. This exporter is designed to run as unprivileged user, root is not required."

This is not quite true, although how much it is and isn't true has varied over time, kernel versions, and also on what host agent information collectors you have enabled. Today, for my own reasons, I decided to get current information on what metrics you lose when you run the current version of the host agent as a non-root user, primarily on an Ubuntu 22.04 server.

(As I write this, the current version of the host agent is 1.3.1, released December 1st 2021. The host agent doesn't see much change.)

With the default collectors, it turns out that all you lose access to is the CPU frequency (which is only available for AMD processors, not x86 Intel ones) and RAPL (Running Average Power Limit) information. In non-default collectors, you also lose access to the 'perf' collector, which by default would give you metrics on various low level CPU performance statistics, such as the number of CPU branch misses and the number of instructions executed.

(On my home desktop, these perf stats reveal that apparently two of my twelve CPUs execute a vastly disproportionate number of instructions and have unbalanced numbers for various other things.)

The latest development version of the host agent also has a slabinfo collector. Since /proc/slabinfo itself is only readable by root, this collector also only works if you run the host agent as root.

In general the host agent collects most of its information through reading things from /proc and sysfs. If some source of information in them is only accessible by root, normally the host agent won't be able to get that information as a non-root user.

This is fewer missing metrics than I expected. We currently run the host agent as root and we'll probably continue to do so, but if we wanted to switch we wouldn't lose very much. However, results in different environments may vary (especially with different kernels), so you probably should check it yourself.

(It's easy enough to start a copy of the host agent on an alternate port, then query localhost:<port>/metrics manually with curl to see what differed. I generally grep for the '# HELP' lines and diff the root and non-root versions.)

PrometheusHostAgentNonRootLosses written at 22:28:02; Add Comment

2022-09-07

What systemd timer directives seem to be used in practice

Systemd .timer units have a bunch of different On<Thing> directives to define the time when the timer(s) trigger, including OnCalendar=. As I discovered when looking into using timers to run things very frequently, there can be more than one way to get what you want with all of these directives. This variety of options raises a straightforward question, namely what do people seem to do in practice.

I'm not energetic enough to download every Ubuntu or Fedora package that has a timer and look at them all. Instead, I'm looking (only) at the packages installed on the Fedora and Ubuntu systems I have ready access to, and especially the timer units that are actually enabled (things that aren't enabled can have weird things lurking in their depths). Widely installed and enabled timer units sort of set the standard for what people expect.

By far the most popular option is OnCalendar. Unsurprisingly there's a bunch of packages that use 'daily' or 'weekly' as basically a replacement for cron.daily and cron.weekly. Even the Certbot timer unit (on both Ubuntu and Fedora) uses OnCalendar, although it has an interesting trick; it sets itself to run at 00:00 and 12:00 but also has a 12 hour randomized delay, so the actual activation time of all of those Certbot timers is (hopefully) randomized very broadly across the day. This same trick is used by fwupd-refresh.timer, motd-news.timer (in Ubuntu), man-db.timer, and plocate-updatedb.timer (although it only activates once a day so it's not quite the same).

There are a certain number of periodic timer units that don't use OnCalendar, such as update-notifier-download.timer (Ubuntu), systemd-tmpfiles-clean.timer (Fedora), and dnf-makecache.timer (Fedora), along with some non-enabled timers from other things. These three timer units vary both in how they start the timer (OnStartupSec or OnBootSec) and how they repeat it (dnf-makecache uses OnUnitInactiveSec while the other two use OnUnitActiveSec).

(In things we don't have enabled but other people probably do, there's also Ubuntu's apport-autoreport.timer and ua-timer.timer.)

Some timer units combine OnCalendar with a system startup based trigger, such as motd-news.timer (Ubuntu) using OnCalendar and also OnStartupSec. Presumably Ubuntu really wanted the motd news to be refreshed soon after system boot.

On the whole it seems that OnCalendar is the consensus way of doing almost everything, to the point where I'm actually surprised by how few timer units use the other approach. And if you want to run N times a day at random times, you use OnCalendar and then a random delay of your time interval.

SystemdTimerMethodsUsed written at 22:29:53; Add Comment

2022-08-28

Getting USB TEMPer2 temperature sensor readings into Prometheus (on Linux)

For reasons outside of the scope of this entry, we recently decided to get some inexpensive USB temperature sensors (we already have a number of old, industrial style temperature sensor boxes). What we wound up getting is the PCsensor TEMPer2; this model and PCsensor's USB temperature sensors in general seem to be a quite common choice (often resold under some other name). Getting our model going on Linux and getting metrics into our Prometheus setup took some work and head scratching, which I'd like to save other people.

The various PCsensor modules have various features and options (see eg here, also, also). The TEMPer2 is a white plastic USB stick with an additional probe wire that you plug in at the end; it has temperature sensors both in the USB stick and at the end of the probe wire, and so provides two readings if the probe wire is (fully and firmly) plugged in. Our particular copies show up in 'lsusb' as:

Bus 001 Device 005: ID 1a86:e025 QinHeng Electronics TEMPer2

In kernel messages, we see:

input: PCsensor TEMPer2 as /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4:1.0/0003:1A86:E025.0001/input/input3
hid-generic 0003:1A86:E025.0001: input,hidraw0: USB HID v1.11 Keyboard [PCsensor TEMPer2] on usb-0000:00:14.0-4/input0
input: PCsensor TEMPer2 as /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4:1.1/0003:1A86:E025.0002/input/input4
hid-generic 0003:1A86:E025.0002: input,hidraw1: USB HID v1.10 Device [PCsensor TEMPer2] on usb-0000:00:14.0-4/input1

You might wonder why a temperature sensor is claiming to be a keyboard. The reason is the same reason that some USB security keys do, which is that the TEMPer2 can optionally simply type its readings into the system. This mode is triggered by pushing the 'TXT' in the red circle for long enough (this is actually a button), and the output looks like this:

www.pcsensor.com
temper2 v3.9
caps lock:on/off/++
num lock:off/on/-- 
type:inner-tx;outer-tx
inner-temp      outer-temp      interval
23.81 [c]       23.31 [c]       1s
23.81 [c]       23.31 [c]       1s
[... repeat ...]

(The 'inner' temperature here is the USB stick temperature, the 'outer' temperature is the probe.)

This mode can be convenient if you're trying to get a sensor reading program to work and you want a check on what the TEMPer2 is seeing and reporting. The easiest way to get the TEMPer2 out of this mode is probably to unplug it and plug it back in again.

There are a variety of programs out on the Internet to read data from the TEMPer series of USB temperature (and sometimes humidity) sensors. However, the whole sensor series has had various versions over time, so any particular program may not support your copy of a 'TEMPer2' even if it supports some (older) TEMPer2s. For instance, older TEMPer2s apparently report a different USB identifier (based on looking at various source code).

The code I used is this version of 'temper.py', with a modification to support our specific TEMPer2s. PCsensor (or the actual maker) revises the firmware periodically, and temper.py is cautious about trying to talk to newer versions. Our sensors report they have firmware 'TEMPer2_V3.9', instead of the V3.7 firmware that temper.py expected. Fortunately it seems to work to just treat V3.9 firmware as V3.7; a quickly modified version of temper.py worked.

(Conveniently temper.py tells you if it found what it thinks is a TEMPer2 but it has an unsupported firmware version.)

It appears that the TEMPer2 is pretty fast to read and can be read quite often (in its 'type things at you' mode it will go as fast as once a second). Depending on how you've automated reading your TEMPer2, you may want to use a systemd timer unit to read it more often than once a minute.

To create Prometheus metrics, I opted to have temper.py output JSON, use jq to extract the particular pieces of information we care about, and then use a shell script (that wraps all of this) to create the actual Prometheus metrics using printf. The metrics are written to the Prometheus host agent's 'textfile' directory so the textfile collector can expose them as part of the regular host metrics. This is not necessarily the completely correct Prometheus approach, but it's pretty much the easiest.

The resulting readings and metrics seem to broadly correspond to reality under the right circumstances (where reality here is what our existing machine room temperature sensors report). There are some oddities and anomalies that I'm still looking into, which is for a future entry (once I understand or at least know more). For now, I will just say that if you want to use the non-probe temperature sensor, you probably want to get a USB extender cable along with your TEMPer2.

USBTemper2SensorToPrometheus written at 23:21:12; Add Comment

2022-08-26

Using systemd timers to run things frequently (some early notes)

If you're satisfied with running something no more often than once a minute, /etc/cron.d entries are the easiest approach and what I use. But today I wound up wanting to run something more frequently. While there are various ways to do this with various degrees of hackery, it seemed like a good time to try out systemd timer units.

First we will need our .service unit, which is straightforward:

[Unit]
Description=Read sensors

[Service]
Type=oneshot
# Run your script
ExecStart=/opt/cslab/generate-sensor-metrics

The skeleton of the corresponding .timer unit is:

[Unit]
Description=Timer for reading sensors

[Install]
WantedBy=timers.target

[Timer]
AccuracySec=5s
... we need some magic here ...

(Setting AccuracySec= insures that systemd runs our service more or less when we ask it to, instead of delaying. You may want to be narrower here, for example saying '1s' as the required accuracy.)

The remaining thing we need is something to tell systemd that, for example, we want this to run every 30 seconds. There are at least two options. First, we can use OnCalendar= to set a systemd time specification to run, say, on :05 and :35 of every minute:

OnCalendar=*:*:05,35

(Another option for how to specify this sort of time is '*:*:0/30', which means more or less what it would in cron.)

This is much like cron time specifications except it allows us to go down to seconds instead of stopping at minutes the way that cron does. After looking at things and writing this entry, I've come to feel that using OnCalendar is your best and simplest option if the time intervals you want divide evenly into a minute (eg things like 'every 30 seconds', 'every 15 seconds', or 'every 10 seconds') and you're not allergic to always running your script at specific seconds.

Alternately we can try to say 'activate this every 30 seconds', or whatever number of seconds you want (including numbers that deliberately don't evenly divide into minutes, such as 31 seconds). Systemd doesn't have a straightforward way of expressing this that I can see; instead, you get to say 'do this 30 seconds after <some events>', and then you pick the events. You need at least two events; one to start things initially, when the system boots or you start the timer, and one to make it repeat. I currently feel that the best single option for a starting event is OnActiveSec=, because that covers all of the various cases where the timer gets activated, not just when the system boots. If you stop the timer for a while and then do 'systemctl start <thing>.timer', this will kick it off.

There are two options for getting things to repeat, OnUnitActiveSec and OnUnitInactiveSec, which define delays from when the unit started and when the unit stopped respectively. I believe that if you want there to be a 30 second delay between successive runs of your script, then you want to use OnUnitInactiveSec. On the flipside, if you want things to tick every 30 seconds no matter how long the script took, you want OnUnitActiveSec.

So a plausible [Timer] section is:

[Timer]
AccuracySec=5s
OnActiveSec=30s
OnUnitActiveSec=30s

You could set 'OnActiveSec' to a smaller value, since all it is there for is to trigger the initial service unit activation that then starts the every 30 second ticker. Generally, the timer will be activated as the system boots, so you'll start ticking 30 seconds after that.

In all cases, I believe that using a systemd timer and service unit means that only one copy of your script will ever be running at one time. Timer units have the effect of activating your service unit, so if your service unit is already (or still) active, systemd does nothing rather than activating a second copy.

One of the downsides of using systemd timers for this is that you get a certain amount of log spam from it. Every time the timer unit starts your service unit, you'll most likely get three log lines:

Starting Read sensors
<thing>.service: Deactivated successfully.
Finished Read sensors

Using crontab generally gets you only one log line per invocation. On the other hand, your (systemd) logs may already be getting flooded from other things; this is definitely the case on some of our machines.

PS: One reason to pick non-divisible numbers of seconds, such as 31, is to insure that you never synchronize with something else that happens either on fixed seconds or at some other fixed interval, like 'every 15 seconds'. However, see the RandomizedDelaySec= and FixedRandomDelay= timer unit settings for other potential options here.

SystemdFastTimersEarlyNotes written at 22:52:29; Add Comment

2022-08-23

On Ubuntu, AppArmor is quite persistent and likes to reappear on you

We don't like AppArmor, in large part because it doesn't work in our environment; the net effect of allowing AppArmor to do anything is that periodically various things break mysteriously (for instance, Evince stops working because your $XAUTHORITY is on an NFS mounted filesystem). We do our best to not install AppArmor at all, and if it gets dragged in by package dependencies, we try to disable it with a heaping helping of systemd manipulation:

systemctl stop apparmor.service
systemctl disable apparmor.service
systemctl mask apparmor.service

If you boot an Ubuntu system this way, everything will look fine. The aa-status command will report that nothing is loaded or active, and nothing goes wrong because AppArmor is in the way. So you go on with life, leaving your systems up (as one does if possible), and then one day you run aa-status again (or try to use Evince) and discover, to your (our) surprise, that AppArmor is reporting things like:

apparmor module is loaded.
5 profiles are loaded.
5 profiles are in enforce mode.
   /usr/lib/cups/backend/cups-pdf
   /usr/lib/snapd/snap-confine
   /usr/lib/snapd/snap-confine//mount-namespace-capture-helper
   /usr/sbin/cupsd
   /usr/sbin/cupsd//third_party
0 profiles are in complain mode.
1 processes have profiles defined.
1 processes are in enforce mode.
   /usr/sbin/cupsd (15098) 
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.

What seems to be going on is that Ubuntu package updates for packages with AppArmor profiles activate those profiles whether or not AppArmor is supposed to be running. As packages get updated over time on your systems, a steadily increasing number of profiles will get silently turned on and then possibly shoot you in the foot.

The specific mechanism for some packages is that they have postinst scripts that check to see if AppArmor is enabled only with 'aa-enabled', which apparently only cares if AppArmor is enabled in the kernel, not if the AppArmor service has been masked, stopped, or whatever. When aa-enabled reports that yes, your Ubuntu kernel has AppArmor enabled because that's the normal condition, the package's postinst script enables its profile and suddenly you have potential problems.

It's possible that some AppArmor profiles get enabled through other mechanisms as well. Even if that's not the case in current Ubuntu LTS releases, I don't think you can count on it to stay that way in the future, and I'm certainly not expecting Canonical to fix aa-enable or their postinst usage of it. It seems pretty clear that Canonical has not exactly devoted much effort to insuring that their systems still work correctly for people who dare to deviate from the Ubuntu way by turning off AppArmor.

In Ubuntu versions that have it available, all I can suggest is running 'aa-teardown' periodically, perhaps from cron. You might think that disabling AppArmor on the kernel command line is what you want, but the signs suggest otherwise (also).

UbuntuAppArmorPersistence written at 22:49:57; Add Comment

2022-08-20

The Ubuntu 22.04 server installer wants you to scrub reused disks first

Suppose that you're installing the Ubuntu 22.04 server version on physical hardware, reusing disks that you were previously using (either on that server or another one). If so, I have a safety suggestion for you: as things stand today, you really want to start out by blanking out the disks you're reusing in some way. This is especially the case if one or more of the disks you're reusing may ever have been part of a software RAID array.

One way to do this is the blkdiscard program, but if that doesn't work for some reason another way is just 'dd if=/dev/zero of=/dev/<whatever> bs=1024k count=64; sync'. You can do this from within the 22.04 server installer by using a function key to get a regular shell session on another virtual terminal and then 'sudo -i' to become root. For safety you may wish to reboot afterward in order to restart the installer from scratch (although on some hardware the BIOS may then get confused enough that you need to power cycle it).

In theory the Ubuntu 22.04 server installer will cope with reused disks that already have partitions on them, perhaps partly because some vendors ship server systems with pre-partitioned drives. In my experience, this can often work fine in practice. But every so often, things go wrong, and the installer will give you an opaque message that it crashed after you did network setup (sometimes, after you picked the Ubuntu mirror to use), before getting to disk selection and partitioning. These crashes seem especially common if you have disks with previous software RAIDs on them, either complete or merely one disk out of several.

The installer will assemble any partial or complete software RAID arrays that it can find. Sometimes I've been able to get a crashed installer to work by using mdadm to stop and then erase those unwanted assembled software RAID arrays; other times, nothing has worked and I've had to go to the big hammer.

I'm not terrible surprised by this. Canonical's focus for the server installer is clearly on cloud installs, where you don't have reused (system) disks. At this point I can only be thankful that the experience isn't even more broken on physical hardware.

PS: My other 22.04 installer hot tip is that if you have a virtual machine with a single small disk, it's okay to take the default partitioning but you should turn off LVM. If you leave LVM on in the installer, you tend to wind up with an absurdly small root filesystem and the rest of your precious disk space gone to some other filesystems that you probably don't care about, like a separate /home. Turning off LVM puts all of the space into a single root filesystem.

Ubuntu2204InstallerScrubDisks written at 21:58:13; Add Comment

(Previous 10 or go back to August 2022 at 2022/08/18)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.