My failure to arrange a graceful TLS root certificate rollover with OpenVPN
Generally, what I write about here is discoveries, questions, and successes. This presents a somewhat misleading picture of what my sysadmin work is like, so today I'm going to talk about a TLS issue that I spent a day or two failing at recently.
(I wouldn't say that failure is a routine event in system administration, but sometimes you can't solve a problem, and it can happen to anyone.)
We have some OpenVPN servers for our users, running on OpenBSD using the OpenBSD packaged version of OpenVPN. When you run OpenVPN, you normally establish a private Certificate Authority, with your own root certificate. This is used to authenticate your OpenVPN server to users, by them verifying that your OpenVPN server presents a host certificate that's ultimately signed by your CA, and it can also be used to sign user certificates that are used to authenticate users. Of course to do this your users have to manually tell their OpenVPN client about your root CA. We do this by providing a copy of our local CA root certificate that they need to download and install in their client.
Almost ten years ago, in August of 2011, we set up the first instance of our modern OpenVPN server infrastructure, and generated its root CA certificate. The default expiry time on this CA certificate was ten years, and so it runs out at the end of August of 2021, which is to say in a couple of months. Since we can't assume that all OpenVPN clients will ignore the expiry time of the CA root certificate, we need to do something about this. The simple thing to do is to generate a new CA root certificate (with a long expiry time) and a new host certificate and start using them, but this creates a flag day where all of our OpenVPN users have to download the new CA certificate from us and switch to it; if they don't switch the CA certificate in time, they stop being able to connect to our OpenVPN servers.
We would like to do better, and I wound up with two ideas for how to do it. My first attempt was to create a new cross-signed CA root certificate (and a new host certificate signed by it). One version of the new root certificate was signed by the current root; our OpenVPN servers would provide this in a certificate chain until the old CA root expired. The other version was self-signed, and would be downloaded by people who'd switch to it in advance. The server host certificate would verify through either certificate chain.
Cross signed root certificates are a reasonably common thing in web TLS, and once I fumbled my way through some things the resulting certificate chains passed verification in OpenSSL and another tool I tend to use. But I couldn't get my test OpenVPN client to validate the host certificate using the new CA certificate.
My second attempt was more brute force. I took the keypair from our
existing CA root certificate and used it to create a new version
of the CA root certificate with the same keypair, the same Subject
Name, and a longer expiry. Since
this used the same keypair and Subject Name as our existing root
certificate, in normal TLS certificate verification it's a perfect
substitute for our current expiring CA. My verification tools said
it was the same and would verify the current host certificate, and
after some work '
openssl asn1parse' said that the two certificates
had the same low-level content except the serial number, the validity
dates, and the signature. But my test OpenVPN client would not
accept the new CA certificate no matter what I did. I even generated
and signed a new (test) server host certificate using this new
version of the CA certificate and had my test OpenVPN server provide
the new CA certificate and the new host certificate while my client
was using the new CA certificate. It didn't work.
At this point I'm out of clever ideas to avoid significant pain for our users. Unless something changes in the situation, the best we can do for people is avoid a flag day as much as possible.
(This sort of elaborates on some tweets of mine. My test OpenVPN client was my Fedora 33 laptop; Fedora's OpenVPN client may be a bit atypical, but we have both Fedora and Ubuntu OpenVPN users, so if our work-around doesn't work with them some of our users will have a bad time.)
PS: Official TLS certificates for our OpenVPN servers wasn't really an option back in 2011, and it's probably still not one for various reasons. I made some tests to see if I could make it work in a test setup (hence my use of Let's Encrypt on OpenBSD) but failed there too, although I didn't investigate very deeply.