2020-05-27
My various settings in X to get programs working on my HiDPI display
Back when I got my HiDPI display (a 27" Dell P2715Q), I wrote an entry about what the core practical problems with HiDPI seemed to be on Linux and talked in general terms about what HiDPI related settings were available but I never wrote about what specific things I was setting and where. Today I'm going to remedy this, partly for my own future use for the hopeful future day when I need to duplicate this at work. Since I'm doing this two years after the fact, there will be an exciting element of software archaeology involved, because now I have to find all of those settings from the clues I left behind in earlier entries.
As mentioned in my old entry, the Dell P2715Q
is a 163 DPI display. To make the X server itself know the correct
DPI, I run it with a '-dpi 163
' command line argument. I don't
use XDM or any other graphical login manager; I
start the X server from a text console with a nest of shell scripts,
so I can supply custom arguments this way. I don't do anything with
xrandr, which came up
with plausible reported screen dimensions of 597mm x 336mm and didn't
appear to need any changes.
I use xsettingsd as my
XSettings daemon, and set two DPI related
properties in .xsettingsd
:
Gdk/UnscaledDPI 166912 Xft/DPI 166912
Both of these values are my 163 DPI multiplied by 1024. For Xft/DPI, this is documented in the Xsettings registry. I'm not sure if I found documentation for Gdk/UnscaledDPI or just assumed it would be in the same units as Xft/DPI.
There is also an X resource setting:
Xft.dpi: 163
As we can see, this is just the DPI.
Then I set some environment variables, which (in 2018) came from Arch's HiDPI page, the Gnome wiki, and the GTK3+ X page. First there is a setting to tell Qt apps to honor the screen DPI:
export QT_AUTO_SCREEN_SCALE_FACTOR=1
Then there is a pair of GTK settings to force GTK+ applications to scale their UI elements up to HiDPI but not scale the text, as explained in more depth in my original entry:
export GDK_SCALE=2 export GDK_DPI_SCALE=0.5
These three environment variables are only necessary for Qt and GTK+ applications, not basic X applications. Basic X applications seem to work fine with some combination of the Xft.dpi X resource and the XSettings system.
If you're running remote X applications from your HiDPI X session, as I am these days, they will automatically see your Xft.dpi X resource and your XSettings settings. They won't normally see your (my) specially set environment variables. Fortunately I mostly run basic X applications that only seem to use X resources and perhaps XSettings, and so basically just work the same as your local versions.
(At least after you fix any problems you have with X cursors on the remote machines.)
At the moment I'm not sure if setting the environment variables for
remote X programs (for instance by logging in with 'ssh -X
',
setting them by hand, and then running the relevant program) works
just the same as setting them locally. Some testing suggests that
it probably is; while I see some visual differences, this is probably
partly just because I haven't adjusted my remote programs that I'm
testing with the way I have my regularly used local ones (after
all, I normally use them on my work regular DPI displays and hopefully
some day I'll be doing that again).
The final setting I make is in Firefox. As mentioned in passing in
this entry, I manually set the
about:config
setting layout.css.devPixelsPerPx
to 1.7, which
is down from what would be the default of '2' based on my overall
settings. I found that if I left Firefox alone with these other
settings, its font sizes looked too big to me. A devPixelsPerPx
setting of 1.7 is about right for what the Arch Wiki Firefox
Tweaks page
suggests should be correct here, and it looks good to me which
is what I care about most.
Sidebar: X resources tweaks to specific applications
Xterm sizes the width of the scrollbar in pixels, which isn't ideal on a HiDPI display. It is normally 14 pixels, so I increased it to:
XTerm*VT100.scrollbar.width: 24
Urxvt needs the same tweak but it's called something different:
URxvt*thickness: 24
I think I also tried to scale up XTerm's menu fonts but I'm not sure it actually worked, and I seem to have the same X resource settings (with the same comments) in my work X resource file.
2020-05-20
Switching to the new in-kernel WireGuard module was easy (on Fedora 31)
One of the quietly exciting bits of recent kernel news for me is
that WireGuard is now built in to
the Linux kernel from kernel 5.6 onward. I've been using a private
WireGuard tunnel on my Fedora machines for
several years now, but it's been through the additional COPR
repository
with an additional DKMS based
kernel module package, wireguard-dkms
. Among other things, this
contributed to my multi-step process fo updating Fedora kernels.
When I first updated to a Fedora 5.6 kernel, I wondered if I was
going to have to manually use DKMS to remove the DKMS installed
WireGuard module in favour of the one from the kernel itself. As
it turned out, I didn't have to do anything; current versions of
the COPR wireguard-dkms package have a dkms.conf
that tells DKMS
not to build the module on 5.6+ kernels. Updating to a 5.6 kernel
got me a warning from DKMS that the WireGuard DKMS couldn't build
on this kernel, but that was actually good news. After a reboot,
my WireGuard tunnel was back up just like normal. As far as I can
tell there is no difference in operation between the DKMS WireGuard
version and the now in-kernel version except that I have one fewer
DKMS module to rebuild on kernel updates.
(The one precaution I took with the COPR wireguard-dkms package was to not install any further updates to it once I'd updated to a 5.6 kernel, because that was the easiest way to keep a WireGuard module in my last 5.5 kernel in case I wanted to fall back.)
After I'd gone through enough 5.6.x Fedora kernel updates to be
sure that I wasn't going back to a 5.5 kernel that would need a
WireGuard DKMS, I removed the WireGuard DKMS package with 'dnf
remove wireguard-dkms
'. Then I let things sit until today, when
I did two more cleanup steps; I disabled the WireGuard COPR
repository and switched over to the official Fedora package for
WireGuard tools with 'dnf distro-sync wireguard-tools
'.
Somewhat to my surprise, this actually installed an updated
version (going from 1.0.20200102 to 1.0.20200319).
(I believe that dnf hadn't previously recognized this as an upgrade because of a difference in RPM epoch number between the two package sources. This may be deliberate so that COPR packages override regular Fedora packages at all times.)
PS: Now that WireGuard is an official part of the Fedora kernel, I feel that I should do something to set up a WireGuard VPN on my work laptop. Unfortunately this really needs a WireGuard VPN server (or touchdown point) of some sort at work. We don't currently have one and the state of the world makes it unlikely we'll deploy one in the near future, even for private sysadmin use.
2020-05-08
Linux software RAID resync speed limits are too low for SSDs
When you add or replace a disk in Linux's software RAID, it has to
be resynchronized with the rest of the RAID array. As very briefly
covered in the RAID wiki's page on resync, this resync process
has speed limits that are controlled by the kernel sysctls
dev.raid.speed_limit_min and dev.raid.speed_limit_max (in
KBytes a second). As covered in md(4)
), if there's no
other relevant IO activity, resync will run up to the maximum speed;
if there is other relevant IO activity, the resync speed will
throttle down to the minimum (which many people would raise on
the fly in order to make resyncs go faster).
(In current kernels, it appears that relevant IO activity is any IO activity to the underlying disks of the software RAID, whether or not it's through the array being resynced.)
If you look at your system, you will very likely see that the values for minimum and maximum speeds are 1,000 KB/sec and 200,000 KB/sec respectively; these have been the kernel defaults since at least 2.6.12-rc2 in 2005, when the Linux kernel git repository was started. These were fine defaults in 2005 in the era of hard drives that were relatively small and relatively slow, and in particular for you were very unlikely to approach the maximum speed even on fast hard drives. Even fast hard drives generally only went at 160 Mbytes/sec of sustained write bandwidth, comfortably under the default and normal speed_limit_max.
This is no longer true in a world where SSDs are increasingly common (for example, all of our modern Linux servers with mirrored disks use SSDs). In theory SSDs can write at data rates well over 200 MBytes/sec; claimed data rates are typically around 500 Mbytes/sec for sustained writes. In this world, the default software RAID speed_limit_max value is less than half the speed that you might be able to get, and so you should strongly consider raising dev.raid.speed_limit_max if you have SSDs.
You should probably also raise speed_limit_min, whether or not you have SSDs, because the current minimum is effectively 'stop the resync when there's enough other IO activity' since modern disks are big enough that they will often take more than a week to resync at 1,000 KB/sec. You probably don't want to wait that long. If you have SSDs, you should probably raise it a lot, since SSDs don't really suffer from random IO slowing everything down the way HDs do.
(Raising both of these significantly will probably become part of our standard server install, now that this has occurred to me.)
Unfortunately, depending on what SSDs you use, this may not do you as much good as you would like, because it seems that some SSDs can have very unimpressive sustained write speeds in practice over a large resync. We have a bunch of basic SanDisk 64 GB SSDs (the 'SDSSDP06') that we use in servers, and we lost one recently and had to do a resync on that machine. Despite basically no other IO load at the time (and 100% utilization of the new disk), the eventual sustained write rate we got was decidedly unimpressive (after an initial amount of quite good performance). The replacement SSD had been used before, so perhaps the poor SSD was busy frantically erasing flash blocks and so on as we were trying to push data down its throat.
(Our metrics system makes for interesting viewing during the resync. It appears that we wrote about 43 GB of the almost 64 GB to the new SSD at probably the software RAID speed limit before write bandwidth fell off a cliff. It's just that the remaining portion of about 16 GB of writes took several times as long as the first portion.)
2020-05-06
Modern versions of systemd can cause an unmount storm during shutdowns
One of my discoveries about Ubuntu 20.04 is that my test machine
can trigger the kernel's out of memory killing during shutdown. My test
virtual machine has 4 GB of RAM and 1 GB of swap, but it also has
347 NFS mounts, and after some investigation, what appears to be
happening is that in the 20.04 version of systemd (systemd 245 plus
whatever changes Ubuntu has made), systemd now seems to try to run
umount
for all of those filesystems all at once (which also starts
a umount.nfs
process for each one). On 20.04, this is apparently
enough to OOM my test machine.
(My test machine has the same amount of RAM and swap as some of our production machines, although we're not running 20.04 on any of them.)
On the one hand, this is exactly what systemd said it was going to
do in general. Systemd will do as much in parallel as possible and
these NFS mounts are not nested inside each other, so they can all
be unmounted at once. On the other hand, this doesn't scale; there's
a certain point where running too many processes at once just
thrashes the machine to death even if it doesn't drive it out of
memory. And on the third hand, this doesn't happen to us on earlier
versions of Ubuntu LTS; either their version of systemd doesn't
start as many unmounts at once or their version of umount
and
umount.nfs
requires enough fewer resources that we can get away
with it.
Unfortunately, so far I haven't found a way to control this in
systemd. There appears to be no way to set limits on how many
unmounts systemd will try to do at once (or in general how many
units it will try to stop at once, even if that requires running
programs). Nor can we readily modify the mount units, because all
of our NFS mounts are done through shell scripts by directly calling
mount
; they don't exist in
/etc/fstab
or as actual .mount
units.
(One workaround would be to set up a new systemd unit that acts
before filesystems are unmounted and runs a 'umount -t nfs
',
because that doesn't try to do all of the unmounts at once. Getting
the ordering right may be a little bit tricky.)
How to set up an Ubuntu 20.04 ISO image to auto-install a server
In Ubuntu 20.04 LTS, Canonical has switched to an all new and not yet fully finished system for automated server installs. Yesterday I wrote some notes about the autoinstall configuration file format, but creating a generally functional configuration file is only the first step; now you need to set up something to install it. Around here we use DVDs, or at least ISO images, in our install setup, so that's what I've focused on.
The first thing you need (besides your autoinstall configuration file) is a suitable ISO image. At the moment, the only x86 server image that's available for Ubuntu 20.04 is the 'live server' image, so that's what I used (see here for the 18.04 differences between the plain server image and the 'live server' one, but then Ubuntu 20.04 is all in on the 'live' version). To make this ISO into a self-contained ISO that will boot with your autoinstall configuration, we need to add some data files to the ISO and then modify the isolinux boot configuration.
The obvious data file we have to add to the ISO is our autoconfigure
file. However, it has to be set up in a directory for itself and a
companion file, and each has to be called special names. Let's say
that the directory within the ISO that we're going to use for this
is called /cslab/inst
. Then our autoinstall configuration file
must be called /cslab/inst/user-data
, and we need an empty
/cslab/inst/meta-data
file beside it. At install time, the path
to this directory is /cdrom/cslab/inst
, because the ISO is mounted
on /cdrom
.
(I put our configuration in a subdirectory here because we put
additional bootstrap files under /cslab
that are copied onto the
system as part of the autoinstall.)
The isolinux configuration file we need to modify in the ISO is
/isolinux/txt.cfg
. We want to modify the kernel command line
to add a new argument, 'ds=nocloud;s=/cdrom/cslab/inst/
'. So:
default live label live menu label ^Install Ubuntu Server kernel /casper/vmlinuz append initrd=/casper/initrd quiet ds=nocloud;s=/cdrom/cslab/inst/ --- [...]
(You can modify the 'safe graphics' version of the boot entry as well if you think you may need it. I probably should do that to our isolinux txt.cfg.)
The purpose and parameters of the 'ds=' argument are described
here.
This particular set of parameters tells the autoinstaller to find
our configuration file in /cslab/inst/
on the ISO, where it will
automatically look for both 'user-data
' and 'meta-data
'.
Some sources will tell you to also add an 'autoinstall
' parameter
to the kernel command line. You probably don't want to do this, and
it's only necessary if you want a completely noninteractive install
that doesn't even stop to ask you if you're sure you want to erase
your disks. If you have some 'interactive-sections
' specified in
your autoinstall configuration file, this is not applicable; you're
already having the autoinstall stop to ask you some questions.
For actually modifying the ISO image, what I do is prepare a scratch
directory, unpack the pristine ISO image into it with 7z
(because
we have 7z installed and it will unpack ISOs, among many other
things), modify the scratch directory, and then build a new ISO
image with:
mkisofs -o cslab_ubuntu_20.04.iso \ -ldots -allow-multidot -d -r -l -J \ -no-emul-boot -boot-load-size 4 -boot-info-table \ -b isolinux/isolinux.bin -c isolinux/boot.cat \ SCRATCH-DIRECTORY isohybrid cslab_ubuntu_20.04.iso
(isohybrid
makes this ISO bootable as a USB stick. Well, theoretically bootable.
I haven't actually tried this for 20.04.)
You can automate all of this with some shell scripts that take an ISO image and a directory tree of things to merge into it (overwriting existing files) and generate a new image.
2020-05-04
Notes on the autoinstall configuration file format for Ubuntu 20.04
Up through Ubuntu 18.04 LTS, Ubuntu machines (usually servers) could be partially or completely automatically installed through the Debian 'debian-installer' system, which the Internet has copious documentation on. It was not always perfect, but it worked pretty well to handle the very initial phase of server installation in our Ubuntu install system. In Ubuntu 20.04, Canonical has replaced all of that with an all new system for automated server installs (you will very much want to also read at least this forum thread). The new system is strongly opinionated, rather limited in several ways, not entirely well documented, at least somewhat buggy, clearly not widely tested, and appears to be primarily focused on cloud and virtual machine installs to the detriment of bare metal server installs. I am not a fan, but we have to use it anyway, so here are some notes on the file format and data that the autoinstaller uses, to supplement the official documentation on its format.
(How to use this data file to install a server from an ISO image is a topic for another entry.)
If you install a server by hand, the install writes a data file, /var/log/installer/autoinstall-user-data, that in theory can be used to automatically reproduce your install. If you're testing how to do auto-installs, one obvious first step is to install a system by hand, take the file, and use it to attempt to spin up an automated install. Unfortunately this will not work. The file the installer writes has multiple errors, so it won't be accepted by the auto-install system if you feed it back in.
There are three minimum changes you need to make. First, set a global version:
#cloud-config autoinstall: version: 1 [...]
Then, in the keymap: section, change 'toggle: null' to use ''
instead of null
:
keyboard: {layout: us, toggle: '', variant: ''}
Finally, change the 'network:' section to have an extra level of 'network:' in it. This changes from:
network: ethernets: [...]
To:
network: network: ethernets: [...]
Given that this is YAML, spaces count and you cannot use tabs.
If you want to interact with some portions of the installer but not
all of it, these are specified in the 'interactive-sections
' YAML
section. For example:
interactive-sections: - network - storage - identity
In theory you can supply default answers for various things in your configuration file for these sections, which show up when you get prompted interactively. In practice this does not entirely work; some default answers in your configuration file are ignored.
In network configuration, there currently appears to be no way to either completely automatically configure a static IP address setup or to supply default answers for configuring that. If you supply a complete set of static IP information and do not set the network section to be interactive, your configuration will be used during the install, but after the system boots, your configuration will be lost and the system will be trying to do DHCP. If you provide a straightforward configuration and set 'network' to interactive, the system will attempt to do DHCP during the install, probably fail, and when you set things manually your defaults will be gone (for example, for your DNS servers). The best you can do is skip having the system try to do DHCP entirely, with a valid configuration that the installer throws up its hands on:
network: version: 2 renderer: networkd ethernets: mainif: match: name: en* [...]
Then you get to set up everything by hand (in a setup that's a regression from what debian-installer could do in 18.04).
One of the opinionated aspects of the new Ubuntu installer is that
you absolutely must create a regular user and give it a password
(even if you're going to immediately wipe out local users to drop
in your own general authentication system), and you cannot give a
password to root; your only access to root
is through 'sudo' from
this regular user. The installer will give this user a home directory
in /home
; you will likely need to remove this afterward. You could
skip making this 'identity' section an interactive section, except
for the problem that the system hostname is specified in the
'identity' section and has no useful default if unset (unlike in
debian-installer, where it defaults to the results of a reverse DNS
lookup). Unfortunately once you make 'identity' an interactive
section, the installer throws away your preset encrypted password
and makes you re-enter it.
So you want something like this:
identity: {hostname: '', password: [...], realname: Dummy Account, username: cs-dummy}
With the initial hostname forced to be blank (and 'identity' included in the interactive sections), the installer won't let people proceed until they enter some value, hopefully an appropriate one.
As sort of covered in the documentation, you can run post-install
commands by specifying them in a 'late-commands:
' section; they're
run in order. When they're run, the installed system is mounted at
/target
and the ISO image you're installing from is at /cdrom
(if you're installing from an ISO image or a real CD/DVD). If you
want to run commands inside the installed system, you can use
'chroot
' or 'curtin
', but the latter requires special usage:
late-commands: - curtin in-target --target=/target -- usermod -p [...] root
(The --target
is the special underdocumented bit.)
There is no curtin
program in the current server install CD; the
installer handles running 'curtin
' magically. This means that you
can't interactively test things during the install on an alternate
video console (you can get one with Alt-F2).
Initially I was going to say that the installer has no way to set
the timezone. This is technically correct but not practically,
because the installer assumes you're using cloud-init, so you set
the timezone by passing a 'timezone
' key to cloud-init for its
'timezone' module
through the '_user-data:' section:
user-data: timezone: America/Toronto
If you don't set this data, you get UTC. This includes if you do a
manual installation with no configuration file, as you might be if
you're just starting with Ubuntu 20.04. In that case, you want to
set it with 'timedatectl set-timezone America/Toronto
' after the
system is up.
I haven't yet attempted to play around with the 'storage' section,
although I have observed that it now wants to always use GPT
partitioning. We always want disk partitioning to require our
approval and allow intervention, but it would be handy if I can set
it up so that the default partitioning that you can just select is
our standard two disk mirrored configuration. As an important safety
tip, when doing mirrored partitioning you need to explicitly make
your second disk bootable (this applies both interactively and
if you configure this in the 'storage' section). If you don't make
a second disk bootable, the installer doesn't create an EFI boot
partition on it. In the configuration file, this is done by setting
'grub_device: true
' in the disk's configuration (which is different
from partition configurations) and also including a 'bios_grub
'
partition:
storage: config: - {ptable: gpt, path: /dev/sda, wipe: superblock-recursive, preserve: false, name: '', grub_device: true, type: disk, id: disk-sda} - {device: disk-sda, size: 1048576, flag: bios_grub, number: 1, preserve: false, type: partition, id: partition-0}
Reading the documentation, it unfortunately appears that you can't specify the size of partitions as a percentage or 'all the remaining space'. This probably makes any sort of 'storage:' section in a generic autoinstall configuration not very useful, unless your systems all have the same size disks. I now think you might as well leave it out (and set 'storage' as an interactive section).
PS: It's possible that there are better ways to deal with several of these issues. If so, they are not documented in a way that can be readily discovered by people arriving from Ubuntu 18.04 who just want to autoinstall their bare metal servers, and who have no experience with Canonical's new cloud systems because they don't use cloud stuff.
PPS: It's possible that an Ubuntu 20.04 server ISO image will some day be made available that use the debian-installer or doesn't behave in all of these ways. Unfortunately, the only currently available 20.04 server ISO image is the 'live' image, which is apparently cloud-focused or at least includes and uses cloud focused tools by default.
2020-05-01
What problems Snaps and Flatpaks are solving
One of the reactions to things like my problems with 'snaps' on Ubuntu 20.04 and the push for Snaps and Flatpak is to ask why they exist at all (for example, this lobste.rs comment). Despite my bad experience with Canonical's Chromium snap and my desire not to use Flatpaks at all, I do see why people are pushing them, or at least I think I do. The necessary disclaimer here is that I'm an outsider and somewhat cynical, so this is my perception of how things really are, not necessarily what people are saying out loud.
Snaps and things like them solve problems both for software developers and for people like Canonical. For software developers, Snaps promise to deliver a 'build once, run on every Linux' experience, rather than the current mess of trying to build for a variety of Linuxes with a variety of system setups, shared library versions, package formats and standards and so on. Although it's not the only thing you need, such a low hassle and reliable experience is pretty necessary if you want to attract more people to the Linux platform, especially commercial software. This is the straightforward pitch for snaps, flatpaks, and so on that you generally hear (for instance, it's almost all of the pitch on the Flatpak site).
For Canonical and people like them, that Snaps and Flatpaks are sandboxed theoretically allows them to reconcile three desires that are otherwise in tension. Canonical wants to run a single 'app store' that users get everything from (because that makes it attractive to both users and developers), for it to be pretty safe to install software from their app store (because otherwise people won't and then developers go away), and to not have to spend a lot of resources and money on vetting developers and auditing submitted Snaps. Canonical will have seen how much resources Apple and Google put into auditing submitted apps, in environments with much better fundamental security than standard Linux has, and they almost certainly want none of that. Sandboxing applications and carefully limiting their power over the system both at install time and at run time is necessary to square the circle here.
(Standard operating system packages are extremely dangerous because they have almost unlimited power at install time and thus often at runtime. This can easily lead third party packages to do undesirable things, or simply to have bugs or badly thought out ideas in their installation. And of course the installed software has free run of the system once you run it, even if it installs nothing ostensibly dangerous. If you're running a distribution, you really can't just put packages prepared by third parties into your distribution repositories; you have to audit both the packaging and the contents. Otherwise, things will blow up in your face sooner or later.)
Sandboxing and other safety measures would not be essential to Snaps if you didn't have the central 'app store' and got your Snaps directly from the software vendor. Then you would already have a direct trust relationship (including possibly deciding to give them money), and if they proved to be untrustworthy after all that would be on you alone. But then neither you nor the software vendors would have the benefits of a central app store, and there are benefits for both parties (in addition to benefits to Canonical).