Wandering Thoughts archives

2020-05-27

My various settings in X to get programs working on my HiDPI display

Back when I got my HiDPI display (a 27" Dell P2715Q), I wrote an entry about what the core practical problems with HiDPI seemed to be on Linux and talked in general terms about what HiDPI related settings were available but I never wrote about what specific things I was setting and where. Today I'm going to remedy this, partly for my own future use for the hopeful future day when I need to duplicate this at work. Since I'm doing this two years after the fact, there will be an exciting element of software archaeology involved, because now I have to find all of those settings from the clues I left behind in earlier entries.

As mentioned in my old entry, the Dell P2715Q is a 163 DPI display. To make the X server itself know the correct DPI, I run it with a '-dpi 163' command line argument. I don't use XDM or any other graphical login manager; I start the X server from a text console with a nest of shell scripts, so I can supply custom arguments this way. I don't do anything with xrandr, which came up with plausible reported screen dimensions of 597mm x 336mm and didn't appear to need any changes.

I use xsettingsd as my XSettings daemon, and set two DPI related properties in .xsettingsd:

Gdk/UnscaledDPI 166912
Xft/DPI 166912

Both of these values are my 163 DPI multiplied by 1024. For Xft/DPI, this is documented in the Xsettings registry. I'm not sure if I found documentation for Gdk/UnscaledDPI or just assumed it would be in the same units as Xft/DPI.

There is also an X resource setting:

Xft.dpi: 163

As we can see, this is just the DPI.

Then I set some environment variables, which (in 2018) came from Arch's HiDPI page, the Gnome wiki, and the GTK3+ X page. First there is a setting to tell Qt apps to honor the screen DPI:

export QT_AUTO_SCREEN_SCALE_FACTOR=1

Then there is a pair of GTK settings to force GTK+ applications to scale their UI elements up to HiDPI but not scale the text, as explained in more depth in my original entry:

export GDK_SCALE=2
export GDK_DPI_SCALE=0.5

These three environment variables are only necessary for Qt and GTK+ applications, not basic X applications. Basic X applications seem to work fine with some combination of the Xft.dpi X resource and the XSettings system.

If you're running remote X applications from your HiDPI X session, as I am these days, they will automatically see your Xft.dpi X resource and your XSettings settings. They won't normally see your (my) specially set environment variables. Fortunately I mostly run basic X applications that only seem to use X resources and perhaps XSettings, and so basically just work the same as your local versions.

(At least after you fix any problems you have with X cursors on the remote machines.)

At the moment I'm not sure if setting the environment variables for remote X programs (for instance by logging in with 'ssh -X', setting them by hand, and then running the relevant program) works just the same as setting them locally. Some testing suggests that it probably is; while I see some visual differences, this is probably partly just because I haven't adjusted my remote programs that I'm testing with the way I have my regularly used local ones (after all, I normally use them on my work regular DPI displays and hopefully some day I'll be doing that again).

The final setting I make is in Firefox. As mentioned in passing in this entry, I manually set the about:config setting layout.css.devPixelsPerPx to 1.7, which is down from what would be the default of '2' based on my overall settings. I found that if I left Firefox alone with these other settings, its font sizes looked too big to me. A devPixelsPerPx setting of 1.7 is about right for what the Arch Wiki Firefox Tweaks page suggests should be correct here, and it looks good to me which is what I care about most.

Sidebar: X resources tweaks to specific applications

Xterm sizes the width of the scrollbar in pixels, which isn't ideal on a HiDPI display. It is normally 14 pixels, so I increased it to:

XTerm*VT100.scrollbar.width: 24

Urxvt needs the same tweak but it's called something different:

URxvt*thickness: 24

I think I also tried to scale up XTerm's menu fonts but I'm not sure it actually worked, and I seem to have the same X resource settings (with the same comments) in my work X resource file.

HiDPIMyXSettings written at 00:43:43; Add Comment

2020-05-20

Switching to the new in-kernel WireGuard module was easy (on Fedora 31)

One of the quietly exciting bits of recent kernel news for me is that WireGuard is now built in to the Linux kernel from kernel 5.6 onward. I've been using a private WireGuard tunnel on my Fedora machines for several years now, but it's been through the additional COPR repository with an additional DKMS based kernel module package, wireguard-dkms. Among other things, this contributed to my multi-step process fo updating Fedora kernels.

When I first updated to a Fedora 5.6 kernel, I wondered if I was going to have to manually use DKMS to remove the DKMS installed WireGuard module in favour of the one from the kernel itself. As it turned out, I didn't have to do anything; current versions of the COPR wireguard-dkms package have a dkms.conf that tells DKMS not to build the module on 5.6+ kernels. Updating to a 5.6 kernel got me a warning from DKMS that the WireGuard DKMS couldn't build on this kernel, but that was actually good news. After a reboot, my WireGuard tunnel was back up just like normal. As far as I can tell there is no difference in operation between the DKMS WireGuard version and the now in-kernel version except that I have one fewer DKMS module to rebuild on kernel updates.

(The one precaution I took with the COPR wireguard-dkms package was to not install any further updates to it once I'd updated to a 5.6 kernel, because that was the easiest way to keep a WireGuard module in my last 5.5 kernel in case I wanted to fall back.)

After I'd gone through enough 5.6.x Fedora kernel updates to be sure that I wasn't going back to a 5.5 kernel that would need a WireGuard DKMS, I removed the WireGuard DKMS package with 'dnf remove wireguard-dkms'. Then I let things sit until today, when I did two more cleanup steps; I disabled the WireGuard COPR repository and switched over to the official Fedora package for WireGuard tools with 'dnf distro-sync wireguard-tools'. Somewhat to my surprise, this actually installed an updated version (going from 1.0.20200102 to 1.0.20200319).

(I believe that dnf hadn't previously recognized this as an upgrade because of a difference in RPM epoch number between the two package sources. This may be deliberate so that COPR packages override regular Fedora packages at all times.)

PS: Now that WireGuard is an official part of the Fedora kernel, I feel that I should do something to set up a WireGuard VPN on my work laptop. Unfortunately this really needs a WireGuard VPN server (or touchdown point) of some sort at work. We don't currently have one and the state of the world makes it unlikely we'll deploy one in the near future, even for private sysadmin use.

WireGuardKernelEasySwitch written at 00:30:30; Add Comment

2020-05-08

Linux software RAID resync speed limits are too low for SSDs

When you add or replace a disk in Linux's software RAID, it has to be resynchronized with the rest of the RAID array. As very briefly covered in the RAID wiki's page on resync, this resync process has speed limits that are controlled by the kernel sysctls dev.raid.speed_limit_min and dev.raid.speed_limit_max (in KBytes a second). As covered in md(4)), if there's no other relevant IO activity, resync will run up to the maximum speed; if there is other relevant IO activity, the resync speed will throttle down to the minimum (which many people would raise on the fly in order to make resyncs go faster).

(In current kernels, it appears that relevant IO activity is any IO activity to the underlying disks of the software RAID, whether or not it's through the array being resynced.)

If you look at your system, you will very likely see that the values for minimum and maximum speeds are 1,000 KB/sec and 200,000 KB/sec respectively; these have been the kernel defaults since at least 2.6.12-rc2 in 2005, when the Linux kernel git repository was started. These were fine defaults in 2005 in the era of hard drives that were relatively small and relatively slow, and in particular for you were very unlikely to approach the maximum speed even on fast hard drives. Even fast hard drives generally only went at 160 Mbytes/sec of sustained write bandwidth, comfortably under the default and normal speed_limit_max.

This is no longer true in a world where SSDs are increasingly common (for example, all of our modern Linux servers with mirrored disks use SSDs). In theory SSDs can write at data rates well over 200 MBytes/sec; claimed data rates are typically around 500 Mbytes/sec for sustained writes. In this world, the default software RAID speed_limit_max value is less than half the speed that you might be able to get, and so you should strongly consider raising dev.raid.speed_limit_max if you have SSDs.

You should probably also raise speed_limit_min, whether or not you have SSDs, because the current minimum is effectively 'stop the resync when there's enough other IO activity' since modern disks are big enough that they will often take more than a week to resync at 1,000 KB/sec. You probably don't want to wait that long. If you have SSDs, you should probably raise it a lot, since SSDs don't really suffer from random IO slowing everything down the way HDs do.

(Raising both of these significantly will probably become part of our standard server install, now that this has occurred to me.)

Unfortunately, depending on what SSDs you use, this may not do you as much good as you would like, because it seems that some SSDs can have very unimpressive sustained write speeds in practice over a large resync. We have a bunch of basic SanDisk 64 GB SSDs (the 'SDSSDP06') that we use in servers, and we lost one recently and had to do a resync on that machine. Despite basically no other IO load at the time (and 100% utilization of the new disk), the eventual sustained write rate we got was decidedly unimpressive (after an initial amount of quite good performance). The replacement SSD had been used before, so perhaps the poor SSD was busy frantically erasing flash blocks and so on as we were trying to push data down its throat.

(Our metrics system makes for interesting viewing during the resync. It appears that we wrote about 43 GB of the almost 64 GB to the new SSD at probably the software RAID speed limit before write bandwidth fell off a cliff. It's just that the remaining portion of about 16 GB of writes took several times as long as the first portion.)

SoftwareRaidResyncOnSSDs written at 00:20:57; Add Comment

2020-05-06

Modern versions of systemd can cause an unmount storm during shutdowns

One of my discoveries about Ubuntu 20.04 is that my test machine can trigger the kernel's out of memory killing during shutdown. My test virtual machine has 4 GB of RAM and 1 GB of swap, but it also has 347 NFS mounts, and after some investigation, what appears to be happening is that in the 20.04 version of systemd (systemd 245 plus whatever changes Ubuntu has made), systemd now seems to try to run umount for all of those filesystems all at once (which also starts a umount.nfs process for each one). On 20.04, this is apparently enough to OOM my test machine.

(My test machine has the same amount of RAM and swap as some of our production machines, although we're not running 20.04 on any of them.)

On the one hand, this is exactly what systemd said it was going to do in general. Systemd will do as much in parallel as possible and these NFS mounts are not nested inside each other, so they can all be unmounted at once. On the other hand, this doesn't scale; there's a certain point where running too many processes at once just thrashes the machine to death even if it doesn't drive it out of memory. And on the third hand, this doesn't happen to us on earlier versions of Ubuntu LTS; either their version of systemd doesn't start as many unmounts at once or their version of umount and umount.nfs requires enough fewer resources that we can get away with it.

Unfortunately, so far I haven't found a way to control this in systemd. There appears to be no way to set limits on how many unmounts systemd will try to do at once (or in general how many units it will try to stop at once, even if that requires running programs). Nor can we readily modify the mount units, because all of our NFS mounts are done through shell scripts by directly calling mount; they don't exist in /etc/fstab or as actual .mount units.

(One workaround would be to set up a new systemd unit that acts before filesystems are unmounted and runs a 'umount -t nfs', because that doesn't try to do all of the unmounts at once. Getting the ordering right may be a little bit tricky.)

SystemdShutdownUnmountStorm written at 21:46:24; Add Comment

How to set up an Ubuntu 20.04 ISO image to auto-install a server

In Ubuntu 20.04 LTS, Canonical has switched to an all new and not yet fully finished system for automated server installs. Yesterday I wrote some notes about the autoinstall configuration file format, but creating a generally functional configuration file is only the first step; now you need to set up something to install it. Around here we use DVDs, or at least ISO images, in our install setup, so that's what I've focused on.

The first thing you need (besides your autoinstall configuration file) is a suitable ISO image. At the moment, the only x86 server image that's available for Ubuntu 20.04 is the 'live server' image, so that's what I used (see here for the 18.04 differences between the plain server image and the 'live server' one, but then Ubuntu 20.04 is all in on the 'live' version). To make this ISO into a self-contained ISO that will boot with your autoinstall configuration, we need to add some data files to the ISO and then modify the isolinux boot configuration.

The obvious data file we have to add to the ISO is our autoconfigure file. However, it has to be set up in a directory for itself and a companion file, and each has to be called special names. Let's say that the directory within the ISO that we're going to use for this is called /cslab/inst. Then our autoinstall configuration file must be called /cslab/inst/user-data, and we need an empty /cslab/inst/meta-data file beside it. At install time, the path to this directory is /cdrom/cslab/inst, because the ISO is mounted on /cdrom.

(I put our configuration in a subdirectory here because we put additional bootstrap files under /cslab that are copied onto the system as part of the autoinstall.)

The isolinux configuration file we need to modify in the ISO is /isolinux/txt.cfg. We want to modify the kernel command line to add a new argument, 'ds=nocloud;s=/cdrom/cslab/inst/'. So:

default live
label live
  menu label ^Install Ubuntu Server
  kernel /casper/vmlinuz
  append   initrd=/casper/initrd quiet ds=nocloud;s=/cdrom/cslab/inst/ ---
[...]

(You can modify the 'safe graphics' version of the boot entry as well if you think you may need it. I probably should do that to our isolinux txt.cfg.)

The purpose and parameters of the 'ds=' argument are described here. This particular set of parameters tells the autoinstaller to find our configuration file in /cslab/inst/ on the ISO, where it will automatically look for both 'user-data' and 'meta-data'.

Some sources will tell you to also add an 'autoinstall' parameter to the kernel command line. You probably don't want to do this, and it's only necessary if you want a completely noninteractive install that doesn't even stop to ask you if you're sure you want to erase your disks. If you have some 'interactive-sections' specified in your autoinstall configuration file, this is not applicable; you're already having the autoinstall stop to ask you some questions.

For actually modifying the ISO image, what I do is prepare a scratch directory, unpack the pristine ISO image into it with 7z (because we have 7z installed and it will unpack ISOs, among many other things), modify the scratch directory, and then build a new ISO image with:

mkisofs -o cslab_ubuntu_20.04.iso \
  -ldots -allow-multidot -d -r -l -J \
  -no-emul-boot -boot-load-size 4 -boot-info-table \
  -b isolinux/isolinux.bin -c isolinux/boot.cat \
  SCRATCH-DIRECTORY

isohybrid cslab_ubuntu_20.04.iso

(isohybrid makes this ISO bootable as a USB stick. Well, theoretically bootable. I haven't actually tried this for 20.04.)

You can automate all of this with some shell scripts that take an ISO image and a directory tree of things to merge into it (overwriting existing files) and generate a new image.

Ubuntu2004ISOAutoinst written at 00:54:54; Add Comment

2020-05-04

Notes on the autoinstall configuration file format for Ubuntu 20.04

Up through Ubuntu 18.04 LTS, Ubuntu machines (usually servers) could be partially or completely automatically installed through the Debian 'debian-installer' system, which the Internet has copious documentation on. It was not always perfect, but it worked pretty well to handle the very initial phase of server installation in our Ubuntu install system. In Ubuntu 20.04, Canonical has replaced all of that with an all new system for automated server installs (you will very much want to also read at least this forum thread). The new system is strongly opinionated, rather limited in several ways, not entirely well documented, at least somewhat buggy, clearly not widely tested, and appears to be primarily focused on cloud and virtual machine installs to the detriment of bare metal server installs. I am not a fan, but we have to use it anyway, so here are some notes on the file format and data that the autoinstaller uses, to supplement the official documentation on its format.

(How to use this data file to install a server from an ISO image is a topic for another entry.)

If you install a server by hand, the install writes a data file, /var/log/installer/autoinstall-user-data, that in theory can be used to automatically reproduce your install. If you're testing how to do auto-installs, one obvious first step is to install a system by hand, take the file, and use it to attempt to spin up an automated install. Unfortunately this will not work. The file the installer writes has multiple errors, so it won't be accepted by the auto-install system if you feed it back in.

There are three minimum changes you need to make. First, set a global version:

#cloud-config
autoinstall:
  version: 1
  [...]

Then, in the keymap: section, change 'toggle: null' to use '' instead of null:

  keyboard: {layout: us, toggle: '', variant: ''}

Finally, change the 'network:' section to have an extra level of 'network:' in it. This changes from:

 network:
   ethernets:
     [...]

To:

 network:
   network:
     ethernets:
       [...]

Given that this is YAML, spaces count and you cannot use tabs.

If you want to interact with some portions of the installer but not all of it, these are specified in the 'interactive-sections' YAML section. For example:

 interactive-sections:
    - network
    - storage
    - identity

In theory you can supply default answers for various things in your configuration file for these sections, which show up when you get prompted interactively. In practice this does not entirely work; some default answers in your configuration file are ignored.

In network configuration, there currently appears to be no way to either completely automatically configure a static IP address setup or to supply default answers for configuring that. If you supply a complete set of static IP information and do not set the network section to be interactive, your configuration will be used during the install, but after the system boots, your configuration will be lost and the system will be trying to do DHCP. If you provide a straightforward configuration and set 'network' to interactive, the system will attempt to do DHCP during the install, probably fail, and when you set things manually your defaults will be gone (for example, for your DNS servers). The best you can do is skip having the system try to do DHCP entirely, with a valid configuration that the installer throws up its hands on:

    network:
      version: 2
      renderer: networkd
      ethernets:
        mainif:
          match:
            name: en*
          [...]

Then you get to set up everything by hand (in a setup that's a regression from what debian-installer could do in 18.04).

One of the opinionated aspects of the new Ubuntu installer is that you absolutely must create a regular user and give it a password (even if you're going to immediately wipe out local users to drop in your own general authentication system), and you cannot give a password to root; your only access to root is through 'sudo' from this regular user. The installer will give this user a home directory in /home; you will likely need to remove this afterward. You could skip making this 'identity' section an interactive section, except for the problem that the system hostname is specified in the 'identity' section and has no useful default if unset (unlike in debian-installer, where it defaults to the results of a reverse DNS lookup). Unfortunately once you make 'identity' an interactive section, the installer throws away your preset encrypted password and makes you re-enter it.

So you want something like this:

  identity: {hostname: '', password: [...],
    realname: Dummy Account, username: cs-dummy}

With the initial hostname forced to be blank (and 'identity' included in the interactive sections), the installer won't let people proceed until they enter some value, hopefully an appropriate one.

As sort of covered in the documentation, you can run post-install commands by specifying them in a 'late-commands:' section; they're run in order. When they're run, the installed system is mounted at /target and the ISO image you're installing from is at /cdrom (if you're installing from an ISO image or a real CD/DVD). If you want to run commands inside the installed system, you can use 'chroot' or 'curtin', but the latter requires special usage:

  late-commands:
    - curtin in-target --target=/target -- usermod -p [...] root

(The --target is the special underdocumented bit.)

There is no curtin program in the current server install CD; the installer handles running 'curtin' magically. This means that you can't interactively test things during the install on an alternate video console (you can get one with Alt-F2).

Initially I was going to say that the installer has no way to set the timezone. This is technically correct but not practically, because the installer assumes you're using cloud-init, so you set the timezone by passing a 'timezone' key to cloud-init for its 'timezone' module through the '_user-data:' section:

  user-data:
    timezone: America/Toronto

If you don't set this data, you get UTC. This includes if you do a manual installation with no configuration file, as you might be if you're just starting with Ubuntu 20.04. In that case, you want to set it with 'timedatectl set-timezone America/Toronto' after the system is up.

I haven't yet attempted to play around with the 'storage' section, although I have observed that it now wants to always use GPT partitioning. We always want disk partitioning to require our approval and allow intervention, but it would be handy if I can set it up so that the default partitioning that you can just select is our standard two disk mirrored configuration. As an important safety tip, when doing mirrored partitioning you need to explicitly make your second disk bootable (this applies both interactively and if you configure this in the 'storage' section). If you don't make a second disk bootable, the installer doesn't create an EFI boot partition on it. In the configuration file, this is done by setting 'grubdevice: true_' in the disk's configuration (which is different from partition configurations) and also including a 'bios_grub partition:

storage:
  config:
  - {ptable: gpt, path: /dev/sda, wipe: superblock-recursive, preserve: false, name: '',
    grub_device: true, type: disk, id: disk-sda}
  - {device: disk-sda, size: 1048576, flag: bios_grub, number: 1, preserve: false,
    type: partition, id: partition-0}

Reading the documentation, it unfortunately appears that you can't specify the size of partitions as a percentage or 'all the remaining space'. This probably makes any sort of 'storage:' section in a generic autoinstall configuration not very useful, unless your systems all have the same size disks. I now think you might as well leave it out (and set 'storage' as an interactive section).

PS: It's possible that there are better ways to deal with several of these issues. If so, they are not documented in a way that can be readily discovered by people arriving from Ubuntu 18.04 who just want to autoinstall their bare metal servers, and who have no experience with Canonical's new cloud systems because they don't use cloud stuff.

PPS: It's possible that an Ubuntu 20.04 server ISO image will some day be made available that use the debian-installer or doesn't behave in all of these ways. Unfortunately, the only currently available 20.04 server ISO image is the 'live' image, which is apparently cloud-focused or at least includes and uses cloud focused tools by default.

Ubuntu2004AutoinstFormat written at 22:52:41; Add Comment

2020-05-01

What problems Snaps and Flatpaks are solving

One of the reactions to things like my problems with 'snaps' on Ubuntu 20.04 and the push for Snaps and Flatpak is to ask why they exist at all (for example, this lobste.rs comment). Despite my bad experience with Canonical's Chromium snap and my desire not to use Flatpaks at all, I do see why people are pushing them, or at least I think I do. The necessary disclaimer here is that I'm an outsider and somewhat cynical, so this is my perception of how things really are, not necessarily what people are saying out loud.

Snaps and things like them solve problems both for software developers and for people like Canonical. For software developers, Snaps promise to deliver a 'build once, run on every Linux' experience, rather than the current mess of trying to build for a variety of Linuxes with a variety of system setups, shared library versions, package formats and standards and so on. Although it's not the only thing you need, such a low hassle and reliable experience is pretty necessary if you want to attract more people to the Linux platform, especially commercial software. This is the straightforward pitch for snaps, flatpaks, and so on that you generally hear (for instance, it's almost all of the pitch on the Flatpak site).

For Canonical and people like them, that Snaps and Flatpaks are sandboxed theoretically allows them to reconcile three desires that are otherwise in tension. Canonical wants to run a single 'app store' that users get everything from (because that makes it attractive to both users and developers), for it to be pretty safe to install software from their app store (because otherwise people won't and then developers go away), and to not have to spend a lot of resources and money on vetting developers and auditing submitted Snaps. Canonical will have seen how much resources Apple and Google put into auditing submitted apps, in environments with much better fundamental security than standard Linux has, and they almost certainly want none of that. Sandboxing applications and carefully limiting their power over the system both at install time and at run time is necessary to square the circle here.

(Standard operating system packages are extremely dangerous because they have almost unlimited power at install time and thus often at runtime. This can easily lead third party packages to do undesirable things, or simply to have bugs or badly thought out ideas in their installation. And of course the installed software has free run of the system once you run it, even if it installs nothing ostensibly dangerous. If you're running a distribution, you really can't just put packages prepared by third parties into your distribution repositories; you have to audit both the packaging and the contents. Otherwise, things will blow up in your face sooner or later.)

Sandboxing and other safety measures would not be essential to Snaps if you didn't have the central 'app store' and got your Snaps directly from the software vendor. Then you would already have a direct trust relationship (including possibly deciding to give them money), and if they proved to be untrustworthy after all that would be on you alone. But then neither you nor the software vendors would have the benefits of a central app store, and there are benefits for both parties (in addition to benefits to Canonical).

SnapsFlatpaksReasonsWhy written at 23:07:19; Add Comment

By day for May 2020: 1 4 6 8 20 27; before May; after May.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.