The fundamental practical problem with the Certificate Authority model

February 10, 2016

Let's start with my tweet:

This is my sad face when people sing the praises of SSH certificates and a SSH CA as a replacement for personal SSH keypairs.

There is nothing in specific wrong with the OpenSSH CA model. Instead it simply has the fundamental problem of the basic CA model.

The basic Certificate Authority model is straightforward: you have a CA, it signs things, and you accept that the CA's signature on those things is by itself an authorization. TLS is the most widely known protocol with CAs, but as we see here the CA model is used elsewhere as well. This is because it's an attractive model, since it means you can distribute a single trusted object instead of many of them (such as TLS certificates or SSH personal public keys).

The fundamental weakness of the CA model in practice is that keeping the basic CA model secure requires that you have perfect knowledge of all keys issued. This is provably false in the case of breaches; in the case of TLS CAs, we have repeatedly seen CAs that do not know all the certificates they mis-issued. Let me repeat that louder:

The fundamental security requirement of the basic CA model is false in practice.

In general, at the limits, you don't know all of the certificates that your CA system has signed nor do you know whether any unauthorized certificates exist. Any belief otherwise is merely mostly or usually true.

Making a secure system that uses the CA model means dealing with this. Since TLS is the best developed and most attacked CA-based protocol, it's no surprise that it has confronted this problem straight on in the form of OCSP. Simplified, OCSP creates an additional affirmative check that the CA actually knows about a particular certificate being used. You can argue about whether or not it's a good idea for the web and it does have some issues, but it undeniably deals with the fundamental problem; a certificate that's unknown to the CA can be made to fail.

Any serious CA based system needs to either deal with this fundamental practical problem or be able to explain why it is not a significant security exposure in the system's particular environment. Far too many of them ignore it instead and opt to just handwave the issue and assume that you have perfect knowledge of all of the certificates your CA system has signed.

(Some people say 'we will keep our CA safe'. No you won't. TLS CAs have at least ten times your budget for this and know that failure is a organization-ending risk, and they still fail.)

(I last wrote about this broad issue back in 2011, but I feel the need to bang the drum some more and spell things out more strongly this time around. And this time around SSL/TLS CAs actually have a relatively real fix in OCSP.)

Sidebar: Why after the fact revocation is no fix

One not uncommon answer is 'we'll capture the identifiers of all certificates that get used and when we detect a bad one, we'll revoke it'. The problem with this is that it is fundamentally reactive; by the time you see the identifier of a new bad certificate, the attacker has already been able to use it at least once. After all, until you see the certificate, identify it as bad, and revoke it, the system trusts it.

Comments on this page:

This is when someone would propose using Certificate Transparency, but except for Let's Encrypt there isn't any major CA publishing their full list. And, it's entirely optional thus far -- the CA base requirements do not require CT. Nor does submitting to one CT log submit any other; the onus is on the CA to submit to every CT log (infeasible) or for auditing groups/programs to track every CT log that comes up (a little more feasible, but still ridiculously hard).

So people have come up with using alternatives, like the blockchain! I feel like that's throwing the baby out with the bath water, though.

By cks at 2016-02-10 08:51:14:

Certificate transparency is solving a different problem, one that is relatively specific to TLS. Most other uses of the CA model have a single CA or at least a single CA for each trust area, which makes other parties monitoring their output somewhat less urgent. TLS is quite special in that it has multiple independent CAs with full signing powers and in practice those CAs are not trusted by the people who actually use the certificates.

(Of course, a CT-like system is useful because it can enable early detection of bad or unexpected certificates. But within an organization you can achieve it by sending your CA signing logs into your regular log monitoring infrastructure and then building alerts on various criteria. TLS needs the whole CT infrastructure because its CAs live in different organizations.)

Also, I believe that CT is not currently part of the TLS trust model in the way OCSP is, in that you don't make a 'is it in the CT' check before trusting a certificate. It's just used for (outside) monitoring of issued certificates. As TLS has shown, you need a proactive pre-trust check; otherwise you are back in the after-the-fact revocation world where attackers get at least one success before you can stop them.

I like the idea of FreeIPA (or something similar) keeping a record of known host keys and serving them as SSHFP records (over DNSSEC, naturally) or via LDAPS (which sss_ssh_knownhostsproxy makes use of). But that only shifts the problem with CAs to DNSSEC and TLS.

By Ewen McNeill at 2016-02-11 16:39:53:

Unfortunately in practice OCSP doesn't even work very well for the web as traditionally deployed -- the OCSP lookups often take so long that the rest of the web transaction could have happened first, so, eg, Google Chrome does not check OCSP. And what do you do if you can't reach the OCSP server? Fail closed? But the OCSP servers traditionally haven't been very reachable even in ideal circumstances. Fail open? In that case, an active attacker just blocks the OCSP traffic too, you fail open, and you're no better off than if you hadn't checked.

The only practical work around to this seems to be "short duration". Which is roughly what OCSP stapling aims to do (attach a short duration OCSP response to a longer duration certificate; eventually refuse certificates without it attached). And is partly what Lets Encrypt's shorter duration aims to do -- plus something that knew of the policy could, eg, refuse to accept a certificate claiming to be valid for any longer.

If the duration is made short enough, then it approximates the benefits of real-time OSCP checks, without the latency.


By cks at 2016-02-11 17:44:05:

My view is that while OCSP has real issues on the web, some form of an 'explicit check' protocol is much more feasible inside an organization (and I'd like to see it supported by protocols at least as an option). Many of the network issues are far less severe inside most organizations and this makes fail-closed much more viable. At least some of the time the organization already has a fail-closed authentication system that it's okay with for most purposes.

(Eg, if you're willing to use LDAP or some other centralized system for authentication you're clearly willing to have a central point of authentication failure.)

By Greg A. Woods at 2016-02-12 16:32:36:

As you've mentioned before, a breach of CA security requires immediately revoking the CA itself and thus all of the certificates it has signed (and ideally it would also mandate some way to audit all prior uses of those certificates back to a time when the CA was known to be secure, which might be day 0, before the first certificate was ever signed).

I would suggest that when signed certificates are used for identification, authentication, and authorization, they really must be only one of the factors used, one with a weight no greater than at least one other factor, such as a passphrase, a one-time key, or maybe a crypto challenge processed by some separate physical device (which one might consider is effectively a separate private key).

I don't think it really matters whether the certificate is presented by the user, the server, or both.

Written on 10 February 2016.
« Old Unix filesystems and byte order
My current views on using OpenSSH with CA-based host and user authentication »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 10 02:12:58 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.