Wandering Thoughts


Getting (and capturing) spam can sometimes be useful to see what's in it

We have what is now a long standing system for logging email attachment type information (everyone should have one). For more than a year we've been receiving .iso attachments that caused our program to log cryptic reports claiming that we sniffed these as tar archives that were oddly empty:

attachment application/x-iso9660-image; MIME file ext: .iso; tar no files?!

(This one is unusual in that it had a correct MIME type. The more common MIME type these come with is application/octet-stream.)

Our commercial anti-spam system (Sophos PureMessage) consistently identifies these as CXmail/IsoDl-A.

I've been vaguely wanting to figure out why these messages cause our program to do this and what was actually in these file attachments for some time, but I've been hampered by the fact that I didn't actually have an example file. Our email system consistently rejects these for being malware (and anyway they weren't sent to me), and for various reasons we don't try to have our attachment type logging system save copies of things under any circumstances. I added some extra logging to the system, but it didn't produce anything.

(In some environments, an attachment logging and filtering system would be critical enough that you should be able to capture copies of things that either cause it problems or that seem questionable. In our environment, it's not and making it capture things would raise both operational issues (like managing what it captures and not running out of disk space) and policy ones (around privacy and so on).)

However, I also run a sinkhole SMTP server on another machine. Recently it got a boring spam message which I almost ignored, except that I noticed it had a suspicious attachment that claimed to be an ISO file in the MIME type information (although it had a .img extension). Out of a spirit of curiosity, I extracted the attachment and poked around in it, discovering that it really was an ISO image (well, a UDF filesystem) and contained a single .EXE. Out of more curiosity, I fed it to our attachment logger program to see if it would reproduce the 'tar no files?!' issue. Lo and behold, it did. Now armed with a reproduction case that I could poke around in, I was soon able to narrow this down to a long standing issue in the Python tarfile module.

So, every so often it's useful to get (and capture) spam. Provided that it's interesting and useful spam, at least.

SpamCapturingCanBeUseful written at 00:20:38; Add Comment


What sorts of good email attachments our users get (March 2019 edition)

Yesterday I looked at the types of attachments we see in malware email. Of course if we're considering blocking some of them, it's not enough to consider just what types we see in malware; we also care about what types we see in legitimate email (or at least in email that is as close to legitimate as we can manage). I did some stats for this a year ago, in the April 2018 edition, but this time around I'm going to be doing the stats slightly differently since I want to compare relatively directly to yesterday's data. Like yesterday, this is over the previous ten weeks, but a slightly different ten weeks (the relevant systems roll their weekly logs at different times).

Over the past ten weeks, we had 54,076 file attachments in 39,607 email messages that were not from DNSBL-listed sources, not identified as spam or virus-laden, and not rejected for other reasons. This is about ten times as many as we had malware attachments, which is either good or bad depending on your perspective. 98.5% of them had MIME filename information, and out of those the most popular file extensions were:

 30462  .pdf
  4210  .jpg
  3688  .docx
  1939  .png
  1773  .ics
  1339  .xlsx
  1009  .txt
   725  .html
   682  .doc
   640  .zip

If I reprocess the data to count how many messages had any particular type of file attachment, the data breaks down this way:

 23789  .pdf
  3177  .docx
  3075  .jpg
  1757  .ics
  1221  .png
  1172  .xlsx
   744  .txt
   690  .html
   629  .asc
   602  .zip
   595  .doc

It is probably not surprising that the image formats drop in this re-ranking, since it's likely common to attach several images to a single message. To my surprise, a number of messages had multiple .zip file attachments, which is why the .zip numbers drop. Multiple .doc and .docx attachments are relatively common.

(In the 'things that make me raise my eyebrows now that I'm looking at them' category, there was one message with 24 .wmz attachments. It came from a 'marketing@<domain>' address, so maybe it was genuine and just, well, marketing.)

Basically all of these file types are unsurprising in our environment (academic computer science). All of the .asc files are PGP stuff (and have appropriate MIME types); I'm a bit surprised that we see so much of it in our email, but then some of this email is things like update notifications from Ubuntu and other sources that's PGP signed. Use of .p7s is not too much below the use of .asc, at 588 attachments. I am a bit surprise to see so many .html attachments, but perhaps some of that is mail sending programs improperly marking HTML parts as attachments instead of inline content.

Nothing particularly stands out about the contents of .zip files and ZIP archives in general, so I'm going to skip any extensive analysis or discussion of them.

At this point it's useful to cross-compare some suspicious file types from yesterday that haven't already been mentioned to see how many legitimate versions of them we see:

   444  .xls
    18  .rar
     1  .iso
     1  .docm

We clearly can't reject .xls file attachments, but it seems likely we could reject .docm and .iso attachments. I was going to say that we could probably reject .rar file attachments as well, but then I took a second look at our data. We could read the RAR file list for all but four of those .rar attachments, and all of the file types in them look legitimate; on closer inspection (eg of source and destination information), even the remaining four look good. It looks like some people just prefer RAR to ZIP, which I can't blame them for.

(The good news version of this finding is that our commercial anti-spam system is apparently very good at finding bad stuff in .rars, since no bad ones seem to have slipped past it.)

GoodAttachmentTypes-2019-03 written at 20:25:48; Add Comment


The types of attachments we see in malware email (March 2019 edition)

Back in mid 2017 I wrote about the types of attachments we saw then in malware-laden email. Today, for reasons beyond the scope of this entry, I feel like looking at our current numbers on this, based on the previous ten weeks of activity. This does not include the slowly but steadily growing collection of attachment types we reject immediately, but it does include 'malware' that is a phish spam in an actual attachment, because that's what our commercial anti-spam system does. As we will see, this is actually a large category of what we detect as 'malware'.

Over 99% of the detected malware attachments had MIME filenames. Out of the 5622 attachments with filenames, the most common file extensions were:

  3008  .html
  1134  .doc
   536  .xlsx
   246  .rar
   245  .iso
    60  .docm
    58  .txt
    57  .docx
    44  .zip
    36  .xls

More than half of these attachments were in messages detected as phish (more or less 55%, as it turns out). However, not all of the phish spam used .html attachments, or at least not directly; instead, it breaks down like this:

  3008 MIME file ext: .html
    58 MIME file ext: .txt
    23 MIME file ext: .zip
     6 MIME file ext: .jpg
     3 MIME file ext: .png
     1 MIME file ext: .htm

All of those .zip attachments actually contain a single .html file. We've seen this sort of single file ZIP smuggling before (1, 2) and now reject it outright for certain file types. We probably don't want to extend that to .html files, but it's slightly tempting.

Out of all of the various things that detect as ZIP archives (which is a lot more than .zip file attachments), there is no particularly dominating set of contents. We do see a certain number of ZIP archives that contain just a single .jar or a .jar plus a .txt, but the absolute numbers are too low to consider a 'reject on sight' policy for them (especially as our users may actually want to get .jars every so often).

My overall conclusion from this is that we don't really have any additional smoking gun file attachment types that we could argue for automatically rejecting on sight. We could raise the argument for .rar and .iso, but they are only 4% or so of the attachments in general. Anyway, this is only half the story; to really ask this question, we need to look at what sort of legitimate attachments our users get and that's another entry.

(Some but not very many messages detected with malware had multiple attachments. I'm not currently interested enough to do a breakdown of what types those messages use. For our purposes, any 'bad' file type that's commonly seen in malware laden email is suspect regardless of whether or not it actually contained the malware.)

MalwareAttachmentTypes-2019-03 written at 19:47:06; Add Comment


A piece of email malware that wanted to make sure we rejected it

Recently our system for logging email attachment type information recorded an interesting attachment:

attachment application/octet-stream; MIME file ext: .ace; zip exts: .exe

The .ace extension is for an old archive file format and today is mostly used by malware, possibly because tools to look inside ACE archives are less common for reasons you can read about on the Wikipedia page (see eg here). We see a certain amount of .ace attachments all of the time, and we've been rejecting them all for some time. However, this attachment is not actually an ACE archive; instead it's a ZIP archive with a single .exe inside it. Single .exes inside ZIP archives are also a pattern we see frequently and we've been rejecting them for even longer than we've been rejecting .ace attachments.

(We knew it was a ZIP archive because it had the right magic signature to be one; we look at basically everything just to see, because ZIP archives can be hiding out under all sorts of extensions. Real ACE archives don't get detected as ZIP archives, especially ones that we can analyze.)

The net result is that regardless of how we interpreted this attachment, we were going to reject it (and we did). I've got to be amused by a spammer who gives us multiple reasons to reject their work, not just a single one.

My obvious theory for what happened here is that the malware spammer got some spam campaigns and processes confused, effectively crossing the wires between an ACE-based campaign and a ZIP-based one. Maybe they run the same campaign with both archive formats to cover all the bases, or maybe they have different campaigns going on at once. Or maybe this is the fault of some spam infrastructure provider. Whatever the cause is, it amuses me.

PS: This turns out to not be the only case of this we've seen in the past year or so. Some of the old ones even had the MIME type of application/zip, so something in the sending infrastructure clearly knew they actually were ZIP archives.

Sidebar: Some details on the message, with an interesting DKIM failure

The message has the usual sort of sender and subject, and a MIME filename of 'Payment Slip.ace'. These days, fake invoices seem to be the going thing. The sending IP is a Digital Ocean server. The message had a DKIM signature but the signature failed validation for the interesting reason of 'invalid - syntax error in public key record'.

You see, the domain the spammers picked to forge is a parked domain, and it has a wildcard TXT record of 'v=spf1 a -all' (with a five minute TTL, which is polite of the domain parker). Wildcard 'nothing is an acceptable sending source' SPF records are not valid DKIM records, but then this domain clearly isn't supposed to generate any email to start with. The domain parker could have been even more thorough by also providing a null MX record, but I'll give them points for trying at least the SPF record.

The malware adding a DKIM signature that could not possibly validate is an interesting touch. Perhaps this is the inevitable end result of Bayesian filtering being applied to spam and then spammers figuring out what people's Bayesian filters are really basing their decisions on.

MalwareACEReallyZIP written at 22:03:50; Add Comment


Even thinking about spam makes me angry

It isn't news to me that dealing with spam makes me irritated and angry. I resent the intrusion into my email, and then I resent the time I spend dealing with it, and in fact I resent its very existence. This is not a rational irritation and hatred; I viscerally dislike spam and people and organizations who spam me. Sensible people would resent spammers only for the time and effort they take to deal with, but I am angry all out of proportion with that.

(This anger is part of what pushes me to think about and try to design elaborate potential anti-spam measures, even when this isn't necessarily wise. It is not that I enjoy the challenge of it all or the like, it is that I want to frustrate spammers.)

What I've recently clued in to is that even thinking about spam often makes me angry, not merely dealing with it. Perhaps this shouldn't surprise me, since I know my reaction is a visceral one and just being reminded of things will set off that sort of reaction, but it kind of does. I am a happier person when I can spend as long as possible paying as little attention as possible to all things involving spam; the less I think of it at all, the better it is for me.

That sounds awfully abstract, so let me make it concrete. I have yet another case of Google being a spammer mailing list provider, and I considered writing it up for Wandering Thoughts. Then I realized that even thinking about it was making me grumpy and soaking in the situation for long enough to write an entry would be even worse, since I can't write an entry about a spam incident without having the spam incident on my mind for the entire time I write.

So, I have decided that I will probably not write that entry. I am angry about the spam and angry at Google and I would like to hold them up to the light (again), but it is not worth it. I would rather be non-angry. Since any reminder about Google's culpability will probably not help, it would also be sensible for me to entirely block email from Google to my spamtrap addresses so I'm completely unaware of any future cases.

It's possible that this will cause me to write less about spam in general on Wandering Thoughts, although I'm going to have to see about that. I lump sort of spam-related issues like DKIM and so on into my spam category, and I likely still have things to talk about there.

(DMARC as a whole is not necessarily an anti-spam feature. As commonly used, it may be more of an anti-phish one, although I'm not sure that works as well as you'd like. That's another entry, though.)

SpamThinkingAnger written at 02:22:21; Add Comment


An odd MIME Content-Disposition or two

One of the things that our system for recording email attachment type information logs is the MIME Content-Disposition header, if it exists. In theory there should be only three cases for this header; if it exists, it should be either inline or attachment, and it might not exist if the message doesn't have multiple MIME parts (because then the implicit disposition is 'inline'). In practice, well, you can guess what happens here.

The first thing that happens is that some number of MIME parts just omit having a Content-Disposition. This is probably legitimate these days (I would have to read the MIME RFCs to know for sure, and I'm not that interested). The more interesting thing is that rarely, people put other values into their C-D headers.

The most normal alternate thing we've seen in C-D headers over the past 60 weeks is the value 'csv'; all of the cases we've seen are for .csv files with the claimed MIME type of application/vnd.ms-excel. Spot-checking a couple of such messages shows that they come from ncbi.nlm.nih.gov, so I suspect that there's some system there for sending out CSV files that does this.

We saw one case of 'attachement' (with an extra 'e' in there), for a PDF file. It's possible this was malware, but it's also possible it's some automated PDF-sending system that manually constructs MIME messages and has gotten the spelling slightly off. We also saw one case of 'related', for a .ico file; again I don't have clear enough signs to guess on malware versus not.

However the case that drove me to write this entry is that last week we had a burst of 14 messages, all with the very special Content-Disposition of:

=?utf-8?b?yxr0ywnobwvuddsgzmlszw5hbwu9ius7moasvuwhrq==?= =?utf-8?b?6k+bsfncqza1nta1lnhsc3gi?=

(I've broken this into two parts for this entry, but in the original it was all one line. This is an RFC 2047 encoded-word thing, per here.)

All 14 of these were identified by our commercial anti-spam system as Exp/20180802-B, which we've seen before. The base-64 Content-Disposition decodes into something that ends in .xlsx, and indeed the attachment was an application/xml ZIP archive with the same cluster of internal file extensions:

zip exts: .bin .png .rels[3] .vml .xml[10] none

Contrary to what I sort of expected, it turns out that these messages are nont single MIME parts but are instead multipart/mixed. Presumably they were directly crafted by something that made a little mistake with what went into the Content-Disposition field, but still managed to sort of properly encode it.

Looking back, over the past 60 weeks we've also seen what look like some other coding mistakes, for example some Content-Dispositions of:

=?utf-8?q?attachment=3b_filename=3d=22payment_instruc?= =?utf-8?q?=e2=80=a6n_-6782_invoce=2etar=22?=

(These two messages were detected as CXmail/MalPE-AC.)

This looks like someone passed the disposition plus the MIME filename to a function designed to encode the disposition alone, which did the best it could under the circumstances. We also saw a third that did the same but with a different filename.

As a side note, 'attachment' is by far the most common Content-Disposition over the past 60 weeks, amounting to about 96.3% of the MIME parts we see. In second place is 'inline', with about 2.3%, and then no Content-Disposition header, at 1.3%. Interestingly, the most common 'inline' file type is PDFs, at 73%, followed by .jpg at 6.7%. I'm surprised that PDFs are so high here, because I wouldn't have thought that they were things mail sending programs ask to be viewed inline.

(A random check on some PDFs I've been sent in email didn't turn up any marked as 'inline'.)

OddMimeContentDisposition written at 23:52:42; Add Comment


Plaintext parts of email are fading away (in spam and non-spam)

One of the things that I've been noticing these days is how much plaintext parts of emails are fading away. I'm not talking here about HTML-only emails (which have been on the rise here for years); instead, this is about MIME multipart/alternative email which theoretically has both a plaintext and a HTML portion. For years I've had my mail system set to show me the plaintext version instead of the HTML version. For a long time that worked reasonably well, but increasingly it's not; when there is a plaintext version that isn't just 'get a HTML capable client', more and more often the plaintext version is incomplete or otherwise not really functional.

This happens in regular email and it also happens in spam email. For instance, my spamtraps recently captured some email where the plaintext portion started:

To view it online, please go here: %%webversion%%

That's the literal text, and it comes from a spam operation that's clearly organized and using dedicated software (and servers) for their spamming.

Of course, plenty of spammers still use plaintext or functional multipart messages; it seems to be especially common with advance fee fraud spammers, who generally have plain text messages anyway and who may be using well implemented webmail software that does this right. But if spammers (and significant mailing list operations) cannot be bothered to even look at their plaintext versions and get them functional, I have to conclude that plaintext versions are becoming vestigial remnants in the modern email ecosystem.

This isn't surprising, really. If anything it's sort of surprising that it hasn't happened before now. Apparently inertia is a thing.

Unfortunately, since this is done by both spam software and legitimate senders, a significant mismatch between the plaintext version and the HTML version is probably not a useful sign of spam. Depending on your tastes and who you get email from, it may still be a useful sign of email you don't want to read.

FadingPlaintextParts written at 02:40:17; Add Comment


What email messages to not send autoreplies to (late 2018 edition)

Our mail system is very old. Much of the current implementation dates back about ten years, when we moved it to be based on Exim, but the features and in some cases the programs involved go back much further than that. One part of it is that we have a local version of the venerable Unix vacation program, and this local version goes back a very long time (some comments say it is the 4.3 BSD-Reno version, which would date it to 1990). By now our version is ancient and creaky, and in general we're no longer enthused about maintaining locally hacked versions of software, so we need to move to using the standard Ubuntu version. Unfortunately, our local version has some differences from the standard one; it supports an additional command line option that's used by an unknown number of people, and we long since made it not autoreply to some additional things over what the standard vacation already ignored. To deal with both problems we're using the standard computer science solution of adding another layer of indirection, in the form of a cover script. One of the jobs of this cover script is knowing what not to autoreply to (beyond extremely obvious things like messages that we detect as spam).

When I started out writing the cover script, I thought this would be simple. This is not the case, as what not to autoreply to has gotten a little bit more complicated since 1990 or so; for instance, there is now an actual RFC for this, RFC 3834. Based on Internet searches and this very helpful Superuser answer, the current list appears start with:

  • a Precedence: header value of 'bulk', 'list', or 'junk'; this is the old standard way.

  • an Auto-submitted: header value of anything but 'no', which is the RFC 3834 standard way. In practice, this is effectively 'if there is an Auto-submitted header'; I searched through a multi-year collection of email and couldn't find anything that used it with a 'no' value.

  • an X-Auto-Response-Suppress: header with effectively any value, although Microsoft's official documentation says that a value of 'none' means that you can auto-reply. In practice that multi-year collection of email contains no cases with the 'none' value.

    (Energetic people can look for only 'All' or 'OOF', but matching this is annoying and, again, my mail collection shows no hits for anything without one or the other of those.)

  • Any of the various headers that indicate a mailing list message, such as List-Id: or List-Unsubscribe:. In a sane world you would only need to look for one of them, but this is not a sane world (especially once spammers get involved); I have seen at least one message with only a List-Unsubscribe:.

  • A null (envelope) sender address, although of course any autoreplies to that aren't going to get very far. Generally you'll want to not autoreply to postmaster@ or mailer-daemon@, although it's not clear how much stuff gets sent out with such envelope senders.

In theory you could stop here and be nominally correct, more or less. In practice it seems clear that you want to do some additional matching on the sender address, to not auto-reply to at least:

  • Definitely various variations on 'noreply' and 'donotreply' sender addresses. You might think that people sending emails with these sender addresses would tag them in various ways to avoid auto-replies, but it is not so; for example, just yesterday Flickr sent me a notification email about some important upcoming changes that came from 'donotreply@flickr.com' and had none of those 'please do not reply' header markers.

  • Probably anything that appears to be an address that exists to collect bounces, especially tagged sender addresses. There are a bunch of patterns for these, where they start with 'bounce-' or 'bounce.' or 'bounce+' or 'bounces+', or come from a domain that is 'bounce.<something>' or 'bounces.<something>'. Just to be different, Google uses '@<something>.bounces.google.com'.

    Some of these 'bounces' addresses are also tagged with various 'do not autoreply' headers, but not all of them. Since tagged bounce addresses are always unique, they'll generally always bypass vacation's attempts to only send an autoreply notification every so often, which is one reason I think one should suppress autoreplies to them.

  • Perhaps all detectable tagged sender addresses, especially repeated sources of them. The one that we've already seen in our logs is AmazonSES ones, some of which don't have any 'don't autoreply' headers. Perhaps there are some AmazonSES senders who should get vacation autoreplies, but I suspect that there are not that many.

(I'm sure that there are some senders who would like to get vacation autoreplies so they know that their email is sort of getting through. It's less clear that our users want those senders to know that, given some of the uses of AmazonSES.)

Possibly you also want to not autoreply to sender addresses with various generic local parts, such as 'root', 'www-data', 'apache', and so on. Perhaps you also want to include 'info', but that feels more potentially questionable; there might actually be a human who reads replies to that and cares about out of office things and so on.

(In general my view is that it's only useful to send autoreplies to actual people, and in some cases sending autoreplies to non-people addresses is at least potentially harmful. If we can establish fairly confidently that a given sender address is not a person, not sending vacation and out of office and so on autoreplies to it is harmless and perhaps beneficial. At the same time it's important not to be too aggressive, because our users do count on their autoreplies reliably telling people about their status.)

PS: In an extremely cautious world, you would not autoreply to anything that hadn't passed either strict SPF checks or strict DMARC policies. You can use DKIM too, but I think only if you carefully check that you're verifying a DKIM signature for the sender domain, because only then have you verified attribution to the domain. I rather expect that this is too strict to make users happy today, because it would exclude too many real people that send them email and so should get their autoreply messages.

Sidebar: My guess about non-human email that lacks these markers

One might wonder why email notifications and other similar large scale messages don't have some version of 'please do not autoreply' tags. My suspicion is that people have found that email without such tags is more likely to appear in people's inboxes on large providers like GMail and so on, while email with those tags is more likely to get dumped into a less frequently examined location.

If you're someone like Flickr (well, SmugMug, who bought Flickr) and really do have an important message that many Flickr members need to read, this leaves you with an unfortunate dilemma. On the whole I can't blame SmugMug for making the email choice that they did; with data at future risk, it is better to err on the side of getting more autoreplies than having people not see your message.

(In this view, the 'donotreply' email sender address is mostly there in the hopes that actual people will not hit 'reply' and send email back, email that will not have the desired effect.)

AutorepliesWhatNot written at 22:31:05; Add Comment


DKIM provides sender attribution (for both spam and not necessarily spam)

The presence of a valid DKIM signature on incoming email doesn't mean anything much about whether or not it's spam, or even if it comes from dedicated spam senders. Spammers can and do add proper DKIM signatures to their messages, and many legitimate senders don't use DKIM or don't have valid DKIM signatures, as our recent DKIM stats demonstrate. For that matter, some spam comes from legitimate places which DKIM sign all of their outgoing email (such as GMail). However, it has recently struck me that what a valid DKIM signature does provide is attribution.

If we receive a piece of email with a valid DKIM signature, the DKIM signature means that we can confidently attribute it to the signing domain. Either it was really sent by that domain or that domain has lost control over either or both of their DNS and their DKIM signing keys, and one of these is far more likely than the other. With a valid DKIM signature, all arguments related to the real sender and backscatter and so on are swept away; it was real email from the sending domain, period. In fact the sending domain went out of its way to make their email attributable to them.

This doesn't mean that the sending domain will accept replies and bounces to that email; far from it. But it does mean that the sending domain can't argue that they didn't send out the email and so are not socially obliged to accept replies. They really sent that email, in a way that provides undeniable attribution. Any refusal to accept replies is just a middle finger extended to other mail systems on the Internet (a fairly common middle finger, of course, because a lot of the modern Internet is defined by not caring about other people).

PS: It strikes me that this attribution may be one reason that large email providers such as GMail increasingly want DKIM signatures these days, because once you have definite attribution for incoming email you can do a number of things based on that with much higher certainty. And people sure can't argue with you about email 'not really coming from them'; they signed it.

(This realization was sparked by a discussion with Aneurin Price in comments in this recent entry. In a sense it's an obvious one, since DKIM's entire purpose is to validate email as coming from a specific source and the flipside of such validation is necessarily attribution.)

DKIMProvidesAttribution written at 21:30:51; Add Comment


Some DKIM usage statistics from our recent inbound email (October 2018 edition)

By this point in time, DKIM (Domain Keys Identified Mail) has been around for long enough and enough large providers like GMail have been pushing for it that it has a certain decent amount of usage. In particular, a surprising number of sources of undesirable email seem to have adopted DKIM, or at least they add DKIM headers to their email. Our Exim setup logs the DKIM status of incoming email on our external MX gateway and for reasons beyond the scope of today's entry I have become interested in gathering some statistics about what sort of DKIM usage we see, who from, and how many of those DKIM signatures actually verify.

All of the following statistics are from the past ten days of full logs. Over that time we received 105,000 messages, or about 10,000 messages a day, which is broadly typical volume for us from what I remember. Over this ten day period, we saw 69,400 DKIM signatures, of which 55 were so mangled that Exim only reported:

DKIM: Error while running this message through validation, disabling signature verification.

(Later versions of Exim appear to log details about what went wrong, but the Ubuntu 16.04 version we're currently using doesn't.)

Now things get interesting, because it turns out that a surprising number of messages have more than one DKIM signature. Specifically, roughly 7,600 have two or more (and the three grand champions have six); in total we actually have only 61,000 unique messages with DKIM signatures (which still means that more than half of our incoming email had DKIM signatures). On top of that, 297 of those messages were actually rejected at SMTP time during DATA checks; it turns out that if you get as far as post-DATA checks, Exim is happy to verify the DKIM signature before it rejects the message.

The DKIM signatures break down as follows (all figures rounded down):

62240 verification succeeded
3340 verification failed - signature did not verify (headers probably modified in transit)
2660 invalid - public key record (currently?) unavailable
790 verification failed - body hash mismatch (body probably modified in transit)
310 invalid - syntax error in public key record

Of the DKIM signatures on the messages we rejected at SMTP time, 250 had successful verification, 45 had no public key record available, 5 had probably modified headers, and two were mangled. The 250 DKIM verifications for messages rejected at SMTP time had signatures from around 100 different domains, but a number of them were major places:

    41  d=yahoo.com 
    18  d=facebookmail.com 
    13  d=gmail.com 

(I see that Yahoo is not quite dead yet.)

There were 5,090 different domains with successful DKIM verifications, of which 2,170 had only one DKIM signature and 990 had two. The top eight domains each had at least 1,000 DKIM signatures, and the very top one had over 6,100. That very top one is part of the university, so it's not really surprising that it sent us a lot of signed email.

Overall, between duplicate signatures and whatnot, 55,780 or so of the incoming email messages that we accepted at SMTP time had verified DKIM signatures, or just over half of them. On the one hand, that's a lot more than I expected. On the other hand, that strongly suggests that no one should expect to be able to insist on valid DKIM signatures any time soon; there are clearly a lot of mail senders that either don't do DKIM at all, don't have it set up right, or are having their messages mangled in transit (perhaps by mailing list software).

Among valid signatures, 46,270 were rsa-sha256 and 15,960 were rsa-sha1. The DKIM canonicalization (the 'c=' value reported by Exim) breaks down as follows:

 51470  c=relaxed/relaxed 
  9440  c=relaxed/simple 
  1290  c=simple/simple 
    20  c=simple/relaxed

I don't know if this means anything, but I figured I might as well note it. Simple/simple is apparently the default.

DKIMIncomingMailStats-2018-10 written at 23:15:41; Add Comment

(Previous 10 or go back to October 2018 at 2018/10/15)

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.