Errors during SMTP conversations aren't trustworthy, illustrated
Recently we had a mail problem where we could not deliver email to a particular remote destination for a while. A major Australian ISP spent six days telling us:
421 4.7.25 Temporarily rejected. Reverse DNS for <our-IP> failed. IB108
(Based on Exim log messages, this happened during the initial SMTP connection, before we even EHLO'd.)
Then later the ISP was fine again, sadly after the person trying to send mail had their attempts time out and contacted us to see if we could do anything about it. The ISP was fine before this incident, and they've been fine ever since, and no other destination reported anything like this message to us.
We did not have malfunctioning nameservers or missing reverse DNS for six days. We did not, as far as we can tell, have DNS servers that the outside world had problems reaching for six days. I suppose it's possible that this large ISP had some internal problem that prevented their DNS servers from talking to our DNS servers for six days, but not so big that they noticed it and dealt with it right away. Alternately, perhaps this ISP was not being honest with us about why they decided not to accept connections from our outgoing email server. We can't tell.
(During the six day problem period, our user was able to reach their recipient on this ISP from some other places, both of which are big email heavyweights, so it was not an issue with the recipient or with the ISP's mail system in general.)
It's not really news or a new thing that the messages you get from other people's mail servers are not necessarily telling you the real reason that your messages aren't being accepted. Many of the major mail providers seem to do it; it's been a long time since I really believed GMail's SMTP time messages, for example. We have many cases where GMail will give temporary 4xx SMTP error codes for an email for a while with various claims in the SMTP error messages, then wind up accepting it. In other cases the 'temporary' 4xx error codes stick for as long as we want to keep retrying and we eventually time out the message.
(My personal lesson learned from this incident was that I should pay more attention to our queued email, then look into things that seemed odd. At the very least I might have been able to reproduce this outside of Exim, and test it from other IPs on the same subnet and elsewhere within the university.)
DKMS built one of my kernel modules for the wrong kernel
Today I discovered that DKMS has spent some time silently (re)building one of my modules for the wrong kernel because DKMS. Naturally it didn't work. Since it was my sensor monitoring, I didn't notice for a while.
There is a complicated story here. This happened on my office
workstation, which needs a very out of tree
version of the
in order to read the motherboard sensors. Because of ongoing
problems in the 5.11 kernel series with my Radeon RX 550 card, I've been repeatedly upgrading my kernel to the
latest Fedora 5.11.x, finding out that the kernel is no good, and
falling back to the last-good kernel, which is Fedora's 5.10.23.
Recently (while in the default state of running 5.10.23), I noticed that I didn't have my usual motherboard sensor readings. Examining kernel messages for it87 problems, I immediately found the smoking gun:
[Fri Apr 23 15:17:51 2021] it87: version magic '5.11.15-200.fc33.x86_64 SMP mod_unload ' should be '5.10.23-200.fc33.x86_64 SMP mod_unload '
At that point I had 5.11.15 installed, making it the highest-version kernel, but I was running 5.10.23. This should be a supported system configuration but apparently DKMS somehow installed the 5.11.15 version of it87 (built when I installed that version) into 5.10.23's module area as well as 5.11.15's. So I told DKMS to remove the module and rebuild it. Surprise:
[Sat May 8 17:43:45 2021] it87: version magic '5.11.15-200.fc33.x86_64 SMP mod_unload ' should be '5.10.23-200.fc33.x86_64 SMP mod_unload '
dkms status' showed that the it87 module was removed before
I had DKMS rebuilt it, DKMS knew what the correct kernel version was
(because it installed the new it87 module there), but it rebuilt it
for the wrong kernel version. What I wound up doing was removing the
5.11.15 RPMs entirely. This finally made DKMS build the module right.
Possibly I could also have made DKMS work right by explicitly
specifying the kernel version in '
dkms build' and '
In the future I'm probably going to explicitly specify the kernel version for every DKMS build and install, even if it's the current running kernel. I'm also going to have to check that DKMS installed modules are for the right kernel, especially if I'm building them in some unusual situation. And obviously I now have a mental note to check that all my sensors still work after every reboot.
Somewhat to my surprise, DKMS is actively maintained in the DKMS git repository. But it is still a 3,935 line Bash script (which is up slightly from 2016). It's really a marvel that it works as well as it does, but on the other hand it's somewhat terrifying that so many Linux systems depend on it working reliably.
(One of the fun things about using DKMS on Fedora is that it rebuilds the initramfs for every installed kernel every time you install a DKMS module, regardless of which kernel you're installing the module into and whether or not the module would be included in the initramfs. This takes a substantial amount of time and there's no way to turn it off.)
Storing ZFS send streams is not a good backup method
One of the eternally popular ideas for people using ZFS is doing backups
by using '
zfs send' and storing the resulting send streams. Although
appealing, this idea is a mistake, because ZFS send streams do not
have the properties you want for a backup format.
A good backup format is designed for availability. No matter what happens, it should let you extract as much from it as possible, from both full backups and incremental backups. If your backup stream is damaged, you should still be able to find and restore as much as possible, both before and after the damage. If a full backup is missing or destroyed, you should still be able to recover something from whatever incrementals you have. This requires incremental backups to have more information in them than they specifically need, but that's a tradeoff you make for availability.
A better backup format should also be convenient to operate, and one big aspect of this is selective restores. A lot of the time you don't need to restore absolutely everything, you just want to get back one file or some files that you need because they got removed, damaged, or whatever. If you have to a complete restore (both full and incremental) in order get back a single file, you don't have a convenient backup format. Other nice things are, for example, being able to readily get an index of what is captured in any particular backup stream (full or incremental).
Incremental ZFS send streams do not have any of these properties and full ZFS send streams only have a few of them. Neither full nor incremental streams have any resilience against damage to the stream; a stream is either entirely intact or it's useless. Neither has selective restores or readily available indexes. Incremental streams are completely useless without everything they're based on. All of these issues will sooner or later cause you pain if you use ZFS streams as a backup format.
ZFS send streams are great at what they're for, which is replicating ZFS
filesystems from one ZFS pool to another in an environment where you can
immediately deal with any problems that come up (whether by retrying the
send of a corrupted stream, changing what it's based on, or whatever
you need to do). The further you pull '
zfs send' away from this happy
path, the more problems you're going to have.
(The design decisions of ZFS send streams make a great deal of sense for this purpose. As a replication format they're designed to be easy to generate, easy to receive, and compact, especially for incremental send streams. They have no internal redundancy or recovery from corruption because the best recovery is 'resend the stream to get a completely good one'.)
It's pleasantly easy to install PyPy yourself (from their binaries)
The Python language server is the most substantial Python program I run on our servers, making it an obvious candidate to try running under PyPy for a speedup. The last time around, I optimistically assumed that I would use the Ubuntu packaged version of PyPy. Unfortunately, all of our login servers are still running Ubuntu 18.04 and 18.04 has no packaged version of PyPy 3. Since Python 3 is what I use for much of both my personal and our work code and you have to run pyls under the same version of Python as the code you're working on, this is a bit of a problem. So I decided to try out the PyPy procedures for installing a pre-built PyPy, with the latest release binaries.
This turned out to be just as easy and as pleasant (on Linux) as the
documentation presented it. The tarball could be unpacked to put its
directory tree anywhere (I put it in my $HOME/lib), and it ran fine
on Ubuntu 18.04 and 20.04. I needed
pip to install
pyls, so I
followed their directions to run '
./pypy-xxx/bin/pypy -m ensurepip',
which downloaded everything needed into PyPy's tree and created a
./pypy-xxx/bin/pip program that I could use for everything else. As
with virtualenvs, once I installed pyls through pip I could run
$HOME/lib/pypy-xxx/bin/pyls and it all just worked.
In theory I think I could go on to use my $HOME/lib versions of PyPy3 and PyPy to create virtualenvs and then install things into those virtualenvs. In practice this is an extra step that I don't need for my purposes. Installing pyls and anything else I want to run under PyPy with './pypy-xxx/bin/pip install ...' already neatly isolates it in a directory hierarchy, just like a virtualenv does.
(Installing PyPy3 was so easy and straightforward that I decided I might as well also install the standard pre-built PyPy2, just so I had a known and up to date quantity instead of whatever Ubuntu had in their PyPy packages. Plus, even if I used the system version, I would have had to make a virtualenv for it. It took almost no extra effort to go all the way to using the pre-built binaries.)
All of this is really how installing pre-built software should work (and certainly how it's documented for PyPy). But I date from an era where it was usually much more difficult and pre-built software was often rather picky about where you put it or wanted to spray bits of itself all over your $HOME (or elsewhere). Right now it's still a bit of a pleasant shock when a pre-built program actually works this easily, whether it's PyPy or Rust.
Understanding OpenSSH's various options around keys and key algorithms
OpenSSH has quite a lot of things involving keys, key types, and key algorithms, with options to control them and ways to report on them and so on. It can be confusing when you read manpages for ssh, ssh_config, sshd, and so on (and it has regularly confused me). It turns out that OpenSSH has a great explanation in their OpenSSH Legacy Options documentation, so great that rather than paraphrase it I am just going to quote it (with some additional commentary):
When an SSH client connects to a server, each side offers lists of connection parameters to the other. These are, with the corresponding ssh_config keyword:
KexAlgorithms: the key exchange methods that are used to generate per-connection [symmetric encryption] keys
HostkeyAlgorithms: the public key algorithms accepted for an SSH server to authenticate itself to an SSH client
Ciphers: the ciphers to encrypt the connection
MACs: the message authentication codes used to detect traffic modification
When a SSH connection is established, the client and server use
the SSH transport protocol
to create the initial set of symmetric encryption keys that will
encrypt the entire conversation going forward, and in the process
the client verifies the server's host key (only one key out of
the keys the server offers).
If a server supports multiple host key types, in theory the client
controls which one will be used based on the order of its
KexAlgorithms 'key exchange methods' have nothing to do with
the public key algorithms used for host key verification, although
they do use related cryptographic techniques. How they work in
specific is covered in various RFCs and other documentation linked
from OpenSSH's Specifications page.
This can be initially confusing since some of the KEX algorithm
names look a bit similar to key type names (eg 'curve25519-sha256'
PubkeyAcceptedKeyTypes(ssh/sshd): the public key algorithms that will be attempted by the client, and accepted by the server for public-key authentication (e.g. via
HostbasedAcceptedKeyTypes(sshd): the key types that will be attempted by the client, and accepted by the server for host-based authentication (e.g via
ssh command supports '
ssh -Q <thing>' to query various
cryptography related things, and you can also use ssh_config
and sshd_config option names as well. As far as I know, this
doesn't look at your actual SSH configuration files; instead, it
reports what your OpenSSH could support if you enabled everything.
By extension it doesn't necessarily list public key algorithms in
your preference order. As far as I know, there's no way to get
OpenSSH to tell you the state of client or server configurations;
you get to read your configuration files and anything else necessary
on your systems.
If I'm right, this means that '
ssh -Q HostkeyAlgorithms', '
-Q HostbasedKeyTypes', and so on will always give you the same
list, even if you have them configured differently. I believe they're
all aliases for '
ssh -Q key-sig'. Not all of the 'ssh -Q' features
have ssh and sshd configuration option aliases, either, I believe
ssh -Q key' (and its kin like '
give you the list of key types instead of key signature algorithms.
Note that there really are three different types of ECDSA keys,
contrary to what I thought yesterday (see the
comments on that entry).
PS: You can set
HostkeyAlgorithms in sshd_config on the server
as well as in the client, and I guess you could use this to turn
off offering "ssh-rsa" to clients right now (and at some point in
the future you may need to use it to turn "ssh-rsa" back on, when
OpenSSH deprecates this key signature scheme). Generally you control what
host key algorithms you offer to clients by what keys you generate
for sshd, since most keys have only one key signature algorithm
(including ECDSA keys, as mentioned).
The different types of modern (2021) SSH keys (and some opinions)
Back in 2014 I wrote about what I knew about the then-current different types of SSH keys. Things have changed around a bit since then, so it's time for an update.
Modern versions of SSH support three different types of public key cryptography for common use; RSA, ECDSA, and Ed25519. Both ECDSA and Ed25519 use elliptic curve cryptography, while RSA is based on integer factorization. SSH once supported DSA public key cryptography, but it has been deprecated since the 7.0 release of OpenSSH in 2017 (search for 'ssh-dss'). OpenSSH supports FIDO/U2F hardware authenticators with ECDSA and Ed25519 keys since OpenSSH 8.2, and supports SSH key certificates for all key types.
To actually use SSH host and user keys, OpenSSH must also pick a
signature scheme. Ed25519 keys only have a single signature scheme,
but ECDSA and RSA keys have several different ones. OpenSSH is on the
path to deprecate the "ssh-rsa" signature scheme, but this doesn't
deprecate RSA keys in general.
OpenSSH lists RSA keys in your
file in a scheme independent way, but lists ECDSA keys in a
scheme-dependent one. There is probably a cryptographic reason for this.
OpenSSH has supported ECDSA keys since OpenSSH 5.7, released at the start of 2011, and Ed25519 keys since OpenSSH 6.5, released at the start of 2014. The stronger RSA key algorithms that OpenSSH now wants you to use have been supported since OpenSSH 7.2, released in February of 2016; however, these were only officially standardized in an RFC in RFC 8332, released in March 2018. By now, support for these key types and signature schemes has propagated to every operating system release that uses OpenSSH and isn't a complete and utter zombie. However, support for them has definitely not propagated into all sorts of SSH servers and clients that are not using OpenSSH, and what support that has propagated may be partial support (some environments support using but not generating Ed25519 keys, for example, and may not yet support the additional RSA signature schemes).
(Unlike in the past, the Gnome Keyring SSH agent implementation apparently does now support Ed25519 keys, apparently since some time in 2018.)
I think that OpenSSH can have different preferred key signature
algorithms for user keys and for host keys, and the preference order
for them can differ between different OpenSSH versions and different
people's builds of them. If I'm reading the official OpenSSH
ssh_config manpage correctly,
the current upstream OpenSSH preference is Ed25519, ECDSA, and then
RSA. The current preferences on Linux distributions can be opaque,
but I think that they generally prefer ECDSA over Ed25519 for host
keys, but Ed25519 over ECDSA for user keys. Don't ask me.
Here today in 2021, I think the consensus is definitely that Ed25519 is the best OpenSSH key type, probably with ECDSA as your second choice. See, for example, the Arch Wiki on choosing the key type, which has a long discussion with links for further reading. I don't know if non-OpenSSH support for ECDSA keys is much different than non-OpenSSH support for Ed25519 keys, although ECDSA in OpenSSH was standardized much earlier (in RFC 5656, from 2009; see OpenSSH Specifications).
As a pragmatic matter, I think that most devices today that don't support Ed25519 keys will probably be using SSH implementations that only support very basic things, like RSA keys with the old "ssh-rsa" key signature algorithm. They may also only support old key exchange algorithms and ciphers. Fortunately OpenSSH has not yet actually removed the code to support things like 'diffie-hellman-group1-sha1', and you can re-enable them if necessary following the information in OpenSSH Legacy Options.
It's possible for Firefox to forget about:config preferences you've set
Firefox has a user preferences system, exposed through its 'Settings'
or 'Preferences' system (also known as about:preferences) and also
through the more low-level configuration editor (aka about:config).
As is mentioned there and covered in somewhat more detail in
what information is in your profile,
these configuration settings (and also your preferences settings)
are stored in your profile's
You might think that once you manually set something in about:config,
your setting will be in prefs.js for all time until you go back
into about:config and change or reset it. However, there's a way
that Firefox can quietly drop your setting. If you've set something
in about:config and your setting later becomes Firefox's default,
Firefox will normally omit your manual setting from your prefs.js
at some point. For example, if you manually enable HTTP/3 by setting
true and then Firefox later makes
enabling HTTP/3 the default (as it plans to), your prefs.js will
wind up with no setting for it.
(You can guess that this is going to happen because Firefox will un-bold an about:config value that you manually change (back) to its initial default value. There's no UI in about:config for a preference that you've manually set to the same value as the default.)
For the most part this is what you want. It certainly acts to clean up old settings that are now no longer necessary so your prefs.js doesn't explode. However it can be confusing in one situation, which is if Firefox later changes its mind about the default. Going back to the HTTP/3 situation, if Mozilla decides that turning on HTTP/3 was actually a mistake and defaults it to off again, your Firefox will wind up with HTTP/3 off even though you explicitly enabled it. In some circumstances this can be confusing; you may remember that you explicitly turned HTTP/3 on, so why is it off now?
HTTP/3 is a big ticket item so you might have heard about Mozilla going back and forth, but Mozilla also changes the defaults for lots of other preferences over time. For instance, I've tweaked my media autoplay preferences repeatedly over time, and I suspect I've had Firefox updates default to some of them (removing my prefs.js settings) and then possibly change later.
If you have any settings that are really important to always be there, I think you may be able to manually create a user.js with them. Otherwise, this is mostly something to remember if you ever wind up wondering how something you remember explicitly setting has changed.
PS: To be clear, I think that Firefox is making a sensible decision (and probably the right decision) in not having a special state for 'manually set but to the same value as the default'. That would need a more complicated UI and more code for something that we almost never care about.
Our future upgrade wave of Ubuntu 18.04 machines
We have long had a mix of Ubuntu versions. The short explanation is that most machines users log in to (our login servers and compute servers) get upgraded every LTS version, while other machines that are less accessible only get upgraded every other LTS (the longer version is How we handle Ubuntu LTS versions). Under normal circumstances, this would currently give us a relatively even mix of 20.04 machines and 18.04 machines. These aren't normal times.
The result of these abnormal times is that we have a lot more 18.04 machines and a lot fewer 20.04 machines than we normally would. None of our user login machines have been upgraded, our mail servers had to move to 18.04 instead of 20.04, and until late last year we lacked much experience with 20.04, so the path of least resistance was using 18.04 for new or upgraded machines because it was a known quantity. This is fine by itself, as Ubuntu 18.04 LTS is a perfectly good Ubuntu release.
Our future issue is that having a lot of 18.04 machines (some of them very critical ones) means that when Ubuntu 22.04 comes out next April, we'll have a lot of machines to upgrade in less than a year (since 18.04 will stop being supported at the end of April 2023). This is probably more unique machines than we've ever had to upgrade in one cycle, even if we assume that the machines users log in to are mostly simple to rebuild. Some of the machines, such as our fileservers, will take extensive testing all on their own.
If we get enough time in the office this summer we may try to upgrade our user login machines to 20.04, even though it's a year behind the usual schedule. We have a test user login machine built, although it hasn't seen much use, and that would let us slip their upgrade to 22.04 until late, perhaps even the summer of 2023.
Beyond that, we could upgrade some machines early, moving we normally wouldn't touch from 18.04 to 20.04 just so we have fewer to move to 22.04 later. This would also give us more of a spread between LTS versions for the long term; otherwise, if we just upgrade all of our 18.04 machines to 22.04 we'll have much the same problem in 2026, with a lot of 22.04 machines to suddenly move to 26.04.
I have no conclusions, but at least this is now an issue I'm going to be thinking about.
(I'm aware that in some places, planning for 2026 would be a laughable idea. It may be optimistic even for us, but I've had long term planning pay off before and in general we exist in an environment with long term stability, although there are somewhat more clouds on the horizon than usual.)
Understanding OpenSSH's future deprecation of the 'ssh-rsa' signature scheme
OpenSSH 8.6 was recently released, and its release notes have a 'future deprecation notice' as has every release since OpenSSH 8.2:
Future deprecation notice
It is now possible to perform chosen-prefix attacks against the SHA-1 algorithm for less than USD$50K.
In the SSH protocol, the "ssh-rsa" signature scheme uses the SHA-1 hash algorithm in conjunction with the RSA public key algorithm. OpenSSH will disable this signature scheme by default in the near future.
More or less a year ago I flailed around about what this meant. Now I think that I understand more about what is going on, enough so to talk about what is really affected and why. Helping this out is that since the OpenSSH 8.5 release notes, OpenSSH has had the current, more explicit wording above about the situation.
When we use public key cryptography to sign or encrypt something, we generally don't directly sign or encrypt the object itself. As covered in Soatok's Please Stop Encrypting with RSA Directly, for encryption we normally use public key encryption on a symmetric key that the message itself is encrypted with. For signing, we normally hash the message and then sign the hash (see, for instance, where cryptographic hashes come into TLS certificates). OpenSSH is no exception to this; it has both key types and key signature schemes (or algorithms), the latter of which specify the hash type to be used.
(OpenSSH's underlying key types are documented best in the ssh-keygen's
manpage for the
-t option. The -sk keytypes are FIDO/U2F keys,
as mentioned in the OpenSSH 8.2 release notes. The supported key signature
algorithms can be seen with '
ssh -Q key-sig'.)
What OpenSSH is working to deprecate is the (sole) key signature algorithm that hashes messages to be signed with SHA-1, on the grounds that SHA-1 hashing is looking increasingly weak. For historical reasons, this key signature algorithm has the same name ('ssh-rsa') as a key type, which creates exciting grounds for misunderstandings, such as I had last year. Even after this deprecation, OpenSSH RSA keys will be usable as user and host keys, because OpenSSH has provided other key signature algorithms using RSA keys and stronger hashes (specifically SHA2-256 and SHA2-512, which are also known as just 'SHA-256' and 'SHA-512', see Wikipedia on SHA-2).
Most relatively modern systems support RSA-based key signature schemes other than just ssh-rsa. Older systems may not, especially if they're small or embedded systems using more minimal SSH implementations. Even if things like routers from big companies support key signature schemes beyond ssh-rsa, you may have to update their firmware, which is something that not everyone does and which may require support contracts and the like. Unfortunately, anything you want to connect to has to have a key signature scheme that you support, because otherwise you can't authenticate their host key.
(OpenSSH Ed25519 keys also have a single key signature scheme associated with them, if you ignore SSH certificates; they are both 'ssh-ed25519'. Hopefully we will never run into a similar hash weakness issue with them. Since I just looked it up in RFC 8709 and RFC 8032, ed25519 signatures use SHA2-512.)
Realizing one general way to construct symmetric ciphers
One of the areas of cryptography that's always seemed magical to me is symmetric ciphers. I believed that they worked, but it felt amazing that people were able to construct functions that produced random-looking output but that could be inverted if and only if you had the key (and perhaps some other information, like a nonce or IV). I recently read Soatok's Understanding Extended-Nonce Constructions, which set off a sudden understanding of a general, straightforward way to construct symmetric ciphers (although not all ciphers are built this way).
A provably secure general encryption technique is the one-time pad. One way to do one-time pad encryption on computers is to have your OTP be a big collection of random bytes (known by both sides) and then use the fact that 'A xor B xor A' is just B. The sender XORs their message with the next section of their OTP, and the receiver just XORs it again with the same section, recovering the original message (this is a form of XOR cipher). However, one-time pads are too big for practical use. What we would like is for each side to generate the one-time pad from a smaller, easier to handle seed.
What we need is a keystream, or more exactly a way to generate a keystream from an encryption key and probably some other values like a nonce (a one-time pad is a keystream that requires no generation). The keystream we generate needs to have a number of security properties like randomness and unpredictability, but the important thing is that our keystream generation function doesn't have to be invertible; in fact, it shouldn't be invertible. There are a lot of ways to do this, especially since it's sort of what cryptographic hashes do, and it's easy for me to see how you could possibly create keystream generation functions.
What I've realized and described is a stream cipher, as opposed to a block cipher. While I'd heard to the two terms before, I hadn't understood the cryptographic nature of this distinction, vaguely thinking it was only about whether you had to feed in a fixed size input block or could use more flexible variable-sized inputs. Now I've learned better in the process of writing this entry and learned something more about cryptography.
(I could probably learn and understand more about how it's possible to construct block ciphers if I read more about them, but there's only so far I'm willing to go into cryptography.)