Looking at DKIM information for our 'good' email (September 2020 edition)

September 28, 2020

I was recently asked (in another context) if DKIM (Domain Keys Identified Mail) was sufficiently common that one could start to require it on email, or at least use it as a strong part of spam scoring. Well, let's get some statistics from our mail system. This has a different focus than my October 2018 statistics, where I was looking at total overall stats; here I'm going to try to look only at email that seems good, based on being accepted by us with a sufficiently low rspamd score.

All of the following statistics are from the past ten days of full logs. Over that period we accepted 82,878 email messages at our external MX, and about 12,400 of these messages got high enough rspamd scores that I'm going to call them 'not good' email, leaving 70,473 messages in my sample set. Out of these, only 52,984 had any DKIM information at all. 25% of my sample set of probably good email has no DKIM information.

So the first answer is that we definitely cannot use the lack of DKIM signatures as a spam scoring signature with any real weight. A quarter of our presumed good incoming email would be marked down by such a check, which is very likely a bad false positive rate (if it's not, rspamd is doing a terrible job, passing more spam messages than it recognizes).

However, of those messages with DKIM signatures, almost all of them passed at least one DKIM signature check (remember that messages can have more than one DKIM signature), 50,754 out of the 52,984. Speaking of signatures, 9,875 of the messages had more than one, which is an appreciable quantity. 538 messages with multiple DKIM signatures had different results from the different signatures, as reported by Exim. I cannot readily generate numbers on how many messages with multiple DKIM signatures failed all of them, but there were 2,737 messages with one or more DKIM signature failures.

(Two messages had three different results. The more amusing one had three signatures for the same domain; one passed, one failed with a body hash mismatch, and one failed signature verification, with Exim suggesting that the headers had probably been modified. It looks like that the message came from a mailing list.)

Given that this amounts to 4% of messages with DKIM failing all checks, for presumed good email, I think that using 'has DKIM but nothing verifies' as a strong spam signal is probably a mistake. Manual inspection suggests that a lot of such email comes from mailing lists. Some of it certainly appears to come directly from the domain generating the DKIM signature, so probably some people have mail systems that are screwing things up, too.

(Some of these 'comes directly from the domain of the DKIM signature but it doesn't verify' are DKIM public key lookups, but some of them are signature verification failures. Apparently some places are generating DKIM signatures and then having their mail system mangle their messages.)

You might think that we could at least rely on email claiming to have our own DKIM signatures to either verify or be not legitimate. I'm afraid to tell you otherwise; there's an appreciable number of incoming messages that we generated, sent somewhere else, and that came back mangled so they failed DKIM checks. These are not all from mailing lists, either, which makes me sigh. We see this happening to email from other University of Toronto subdomains as well.

We saw a total of 522 DKIM failures for 'public key record unavailable', from 89 different DKIM 'd=' domain values, including a number of places that really should know better. It's quite possible that some of these are transient DNS failures. I haven't investigated this further because I don't care that much.

My overall conclusion is that for us, DKIM is not a useful spam scoring signal, even if we restrict our scope to scoring on DKIM results. Your results may vary depending on your mail patterns, and that's really the other conclusion; right now, you need to evaluate the DKIM results for your own specific incoming email patterns before making any decisions here.

Comments on this page:

By Jukka at 2020-09-28 05:31:25:

Difficult questions.

Regarding spam-scoring generally: I wonder whether it would make sense to have two buckets: "valid DKIM" and everything else, and, then, more stringent scoring would be done in the latter bucket.

Another classical question: "how much is much"? One could argue that the 4% you noted actually seems pretty promising?

By cks at 2020-09-28 14:30:59:

Without having checked our current numbers to be sure, I feel that it is very unlikely that the presence of a valid DKIM signature is particularly correlated with the message not being spam. Certainly spammers are perfectly capable of DKIM signing their email messages with the same junk domains that they are already registering and using for the MAIL FROMs, and we know that they do. Spam sent through major providers will also have valid DKIM signatures; this includes various forms of semi-manual spam sent through big email places like GMail, Google Groups mailing list spam, and commercial mail sending services.

My view is that a 4% false positive rate is far too high. That's almost one message in 20 that a hypothetical typical person would be getting, and I believe that most people are far more sensitive to false positives than to false negatives.

Written on 28 September 2020.
« Remote power control for your machines comes in two flavours
Making product names of what you use visible to people is generally a mistake »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Sep 28 01:19:52 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.