Watching the recent AddTrust root CA certificate expiry has been humbling
The news of the recent past is that the old 'AddTrust External CA Root' root certificate expired on May 30th (at 10:48:48 UTC). Before this happened, I confidently told more than one person that one reason I was confident that our TLS certificate environment wasn't affected by this was that our Prometheus based monitoring system specifically looks at all of the TLS certificates in a certificate validation chain, not just only the first ('leaf') certificate for the server itself, and reports the lowest expiry time. Since our alerts had not been going off, we didn't have the AddTrust CA root in our certificate chains. Although we had no problems ourselves, in retrospect this looks naive and has exposed a real issue to think about with TLS certificate monitoring.
Generally what broke because of the AddTrust root expiring is (and was) not current browsers or even current monitoring things like Prometheus (at least when run on reasonably current systems). Instead, it was older software, such as OpenSSL 1.0.1 (via), and older systems using old root certificate bundles. These systems either only had the AddTrust root to rely on, without the modern roots that have supplanted it, or had programming issues (ie, bugs) that caused them to not fall back to try additional certificate chains when they hit the expired AddTrust root CA certificate.
What this points out is that whether TLS validation works can depend on the client (and the client's environment), especially in the face of expired or invalidated certificates somewhere in the chain. So far, we haven't been considering this in monitoring and testing. Our monitoring has been tacitly assuming that if Prometheus' Blackbox checks liked our TLS certificate chains, everything was good for all clients everywhere. So has our testing, more or less; if we're testing a new HTTPS web server or whatever, we'll point a browser at it, see if the browser is happy, and then call it done.
(This is especially questionable because browsers go way out of their way to try to make TLS certificate chains work; they'll use cached intermediate certificates and sometimes even fetch them on the fly.)
This monitoring and testing is very likely safe for all modern client software (browsers, IMAP clients, and also programming tools and environments). But it's likely not universally safe for us. We can have old programs on old operating systems, and we can have client programs where we've needed to specifically configure a certificate chain for some reason. Those and similar things may well fail in the face of an issue similar to this AddTrust one, and without our monitoring and testing flagging it.
I don't have any particular answers for this. For web servers, we often use the SSL Server Test, which reports results for a variety of older browsers as well as current ones (although I'm not certain that that covers certificate chain issues for them or if it's just ciphers). For IMAP servers or the like, well, we'd have to wait for problem reports from people with old clients or something.
(Since we're using Let's Encrypt for everything today, with automatically built certificate chains, it's probably not worth setting up monitoring to look for unnecessary intermediate certificates or badly formed certificate chains. Neither should happen short of a catastrophic malfunction in Certbot or Let's Encrypt.)