Google Groups entirely ignores SMTP time rejections
I have a very old mail address that I have made into a complete
spamtrap through my sinkhole SMTP server. Simplifying only slightly,
my SMTP server rejects all email to this address no later than after
the end of transferring the message (at the message termination after
DATA). When it rejects email this late, I capture the full
On September 5th of last year (2018), this spamtrap email address rejected a message from Google Groups informing it that it had been added to a spam mailing list:
Subject: You have been added to Equity Buyers Network
From: Equity Buyers Network <email@example.com>
Google Groups ignored this rejection and began sending email messages
from the group/mailing list to my spamtrap address. Each of these
messages was rejected at SMTP time, and each of them contained a
MAIL FROM address (a VERP),
which good mailing list software uses to notice delivery failures
and unsubscribe addresses. Google Groups is, of course, not good
mailing list software, since it entirely ignored the rejections.
I expect that this increases the metrics of things like 'subscribers
to Google Groups' and 'number of active Google Groups' and others
that the department responsible for Google Groups is rewarded for.
Such is the toxic nature of rewarding and requiring 'engagement',
especially without any care for the details.
Since September 5th of 2018 up until yesterday, the spamtrap address has rejected ten email messages. These came September 10th, September 21st, November 20th and 23rd, December 22nd, 24th, and 31st (the last of which prompted this entry), February 4th, and now June 12th and 21st. Many of them were stock touting spam (or in general asset touting spam); some were touting the 'message broadcasting' services of the company that established the mailing list. A couple were both at once, and if you guessed that this involves cryptocurrency (or at least something that sounds like it), you would be correct.
Due to writing this entry and thinking about the issue, I've changed my spamtrap system so that it will no longer accept any email from Google Groups and will reject such email attempts immediately as 'uninteresting spam' that is not even specifically logged. I'm not interested in being part of Google's outsourced abuse department, and this insures that I don't have to think about this any more.
(At one point I might have been interested in just what spammers set up on Google Groups, but I no longer am. That Google Groups is a de facto spam service is no longer news and I am not interested in the specifics, any more than I am with, say, Yahoo Groups.)
Sidebar: Who the responsible party is
The claimed information is a cluster of corporate names and two listed addresses. I will let you consult search engines for their websites, since I have no desire to contribute anything even approaching links. They are:
Questrust Ventures Inc
Equity Buyers Group
1875 Avenue of the Stars Ste 2115
Century City CA 90067
iAstra Canada Richmond BC, Canada, v7y-3j5
That this organization claims a Canadian address (which may or may not be real) means that they theoretically definitely fall within the reach of Canada's anti-spam laws, which they are pretty definitely acting in violation of. Of course the odds that they will ever be held to account for that are probably low.
We get a certain amount of SMTP MAIL FROM's in UTF-8 with odd characters
On Twitter, I said:
There sure are a surprising number of places that are trying to send us SMTP MAIL with a MAIL FROM that contains the Unicode character U+FEFF (either 'zero width no-break space' or a byte order mark, apparently, although it's never at the start of the address).
I was looking at the logs on our external mail gateway machine
because we use Exim and I was interested to see if we had been poked
by anyone trying to exploit CVE-2019-10149.
I didn't find anyone trying, but I did turn up these SMTP
A typical example from today is:
H=(luxuryclass.it) [22.214.171.124] rejected MAIL <Antonio<U+FEFF>Smith@luxuryclass.it>
<U+FEFF>' bit is me cutting and pasting from
Unicode codepoints this way.)
These and similar hijinks have been going on for some time. We have logs going back more than a year, and the earliest hit I can casually turn up is in late May of 2018:
H=(03216a51.newslatest.bid) [126.96.36.199] rejected MAIL <NaturalHairCare<U+200B>@newslatest.bid>
(U+200B is a zero width space, so this feels like something similar to the use of U+FEFF.)
In October of 2018, we saw a few uses of U+200E 'left to right mark':
H=(0008ceef.livetofrez.us) [188.8.131.52] rejected MAIL <TinnitusRelief<U+200E>@livetofrez.us>
Then at the start of November of 2018 we started seeing U+FEFF, which has taken over as the Unicode codepoint of choice to (ab)use:
H=(office365zakelijk.nl) [184.108.40.206] rejected MAIL <Howard<U+FEFF>Smith@office365zakelijk.nl>
We have seen a flood of these since then; they're pervasive in our logs
based purely on looking at things in
less (someday I will work out how
grep for Unicode codepoints by codepoint value, but that day is not
On a quick check, the most recent ones come from IP addresses that are
listed in the SBL CSS, as well as any
number of other DNS blocklists. I don't really care, since as long as
they're helpful enough to put UTF-8 bytes into their
MAIL FROM, we'll
reject all of their email.
PS: I checked the raw bytes of some of the U+FEFF
MAIL FROMs, and
they really have the byte sequence 0xEF 0xBB 0xBF that is a true
UTF-8 encoded U+FEFF. I'm relatively confident that Exim isn't doing
any character mangling on the way through, either, so we're almost
certainly seeing what was really on the wire.
An interesting report on newly used domain names and their usage in spam
One of the interesting things from Geoff Huston's DNS-OARC 30: Bad news for DANE (via, which has useful comments, especially from tptacek, and seen also Against DNSSEC) is some information about the churn in new domain names over the time span of a week, in a section called "The modality of mortality of domain names". I'm just going to quote the end summary, but the whole section is well worth reading. The summary:
The majority of the short-lived names were observed in the gTLD space, and here blacklisting is the primary cause of name death. This was also observed in those ccTLDs that are used as generic TLDs. Overall, some 8% of new names die within seven days.
The observation from this study is that we appear to be spending a huge set of resources to remove names that should never have existed in the first place. If further rounds of new gTLD rounds turn out to be little more than an exercise to offer more choices for spammers, then why are we doing this to ourselves?
(Geoff Huston's article has the wrong link for the presentation materials; the correct link is The Modality of Mortality in Domain Names. Also, 'name death' here does not mean that the DNS records are removed; merely being listed on a domain blacklist is enough. From Paul Vixie's slides, the domain blacklists used are Spamhaus, Swinog URIBL, and SURBL.)
The cynical observation is that people pay a lot of money to register as operators for new gTLDs, and who is going to turn down that money? The operators may not make much money (but maybe they do, from some spammers), but the people who approve new gTLDs and get money for them sure do.
Another striking thing from the slides is that almost 1/5th of new gTLD domains die within a week, and it is usually due to blacklists. This is a much higher rate of death than the overall numbers, which backs up what I suspect will be most people's intuition that random gTLD domain names are most likely to be involved in spam. Some gTLDs have dramatic death rates in the study; the slides suggest that 65% of new domains in .date get blacklisted within a week, for example.
This is for 'newly observed domains', which means that this is the first time the domain names have been used. They may or may not have been registered recently, although the speculation that some fast removals from the DNS result from credit card chargebacks and other charging failures suggests that perhaps that recent registration is also the case.
Since blacklisting is apparently often so fast, there is an obvious approach in an anti-spam system that wanted to do the work. You could keep track of domain names that you've seen in email and then temporarily defer all messages with new domain names for six hours or so. This is a clear extension of IP-based or sender-based greylisting, with part of the same goal of hoping that any bad actors appear on blocklists before you reach your timeout period and accept the email.
DKIM signed email as a signal (of something)
Over on Twitter, I said:
So [mimecast.com] bills itself as a cyber-security firm that makes email more secure, and of course the only email I get from them is a spam email, with most of the contents being an brag ad footer for Mimecast itself.
Also, @Mimecast does not DKIM sign outgoing email, so I suspect that it is not going to be so good at being delivered to the increasing number of places that care about that.
(Not our mail server, obviously, for reasons. Mind you, spammers often DKIM sign their email too.)
These days, I have a bias. That bias is that if you are in any sort of 'email as a service' business and you don't sign your outgoing email with DKIM, you are probably not someone I want email from.
This is not because a (valid) DKIM signature means that your email isn't spam. Plenty of spammers put valid DKIM signatures on email these days. Instead it is because increasingly, large email places like GMail and so on are more or less insisting on DKIM signatures from many places. Not DKIM signing your email in the face of this speaks to a certain ignorance of or indifference to the modern practicalities of email, and I don't consider that a good sign. To put it one way, if you're not doing something as pragmatically important as DKIM signatures, what else are you not doing? The answer is unlikely to make me happy.
So far, for me this only applies to places that do email as their business in some way. If email is not part of your product, I don't think it's as much of a gap to leave out DKIM, and anyway there may be complex policy issues around things like DKIM and DMARC.
(Of course you may want to use DKIM, SPF, and DMARC, because the 800-pound gorillas of the email world may more or less require you to do so. But that's a different thing, and also a pragmatic decision for you to make.)
PS: In case you're curious, we
don't currently DKIM sign any of our outgoing email, and we will
probably never have a strong DMARC policy (this is subject to change,
especially if the 800 pound gorillas start insisting; being able
to deliver email to GMail is not optional for us). We do have a
hand-waving SPF record, which we put in for partially superstitious
reasons years ago. I have no idea if it does any good or any harm
in general, although I'm sure that there are crazy people using the
presence of '
?all' in our record as an excuse to reject some
email. My view on that is that sooner or later, crazy people will
use anything at all to reject email.
(There are probably people today who refuse to accept email unless it's from domains with published strict DMARC policies. That's their choice and hopefully it works for them.)
Getting (and capturing) spam can sometimes be useful to see what's in it
We have what is now a long standing system for logging email
attachment type information (everyone should have one). For more than a year we've been receiving
that caused our program to log cryptic
reports claiming that we sniffed these as tar archives that were
attachment application/x-iso9660-image; MIME file ext: .iso; tar no files?!
(This one is unusual in that it had a correct MIME type. The more common MIME type these come with is application/octet-stream.)
Our commercial anti-spam system (Sophos PureMessage) consistently identifies these as CXmail/IsoDl-A.
I've been vaguely wanting to figure out why these messages cause our program to do this and what was actually in these file attachments for some time, but I've been hampered by the fact that I didn't actually have an example file. Our email system consistently rejects these for being malware (and anyway they weren't sent to me), and for various reasons we don't try to have our attachment type logging system save copies of things under any circumstances. I added some extra logging to the system, but it didn't produce anything.
(In some environments, an attachment logging and filtering system would be critical enough that you should be able to capture copies of things that either cause it problems or that seem questionable. In our environment, it's not and making it capture things would raise both operational issues (like managing what it captures and not running out of disk space) and policy ones (around privacy and so on).)
However, I also run a sinkhole SMTP server on another machine.
Recently it got a boring spam message which I
almost ignored, except that I noticed it had a suspicious attachment
that claimed to be an ISO file in the MIME type information (although
it had a
.img extension). Out of a spirit of curiosity, I extracted
the attachment and poked around in it, discovering that it really
was an ISO image (well, a UDF filesystem) and contained a single
.EXE. Out of more curiosity, I fed it to our attachment logger
program to see if it would reproduce the 'tar no files?!' issue.
Lo and behold, it did. Now armed with a reproduction case that I
could poke around in, I was soon able to narrow this down to a
long standing issue in the Python
So, every so often it's useful to get (and capture) spam. Provided that it's interesting and useful spam, at least.
What sorts of good email attachments our users get (March 2019 edition)
Yesterday I looked at the types of attachments we see in malware email. Of course if we're considering blocking some of them, it's not enough to consider just what types we see in malware; we also care about what types we see in legitimate email (or at least in email that is as close to legitimate as we can manage). I did some stats for this a year ago, in the April 2018 edition, but this time around I'm going to be doing the stats slightly differently since I want to compare relatively directly to yesterday's data. Like yesterday, this is over the previous ten weeks, but a slightly different ten weeks (the relevant systems roll their weekly logs at different times).
Over the past ten weeks, we had 54,076 file attachments in 39,607 email messages that were not from DNSBL-listed sources, not identified as spam or virus-laden, and not rejected for other reasons. This is about ten times as many as we had malware attachments, which is either good or bad depending on your perspective. 98.5% of them had MIME filename information, and out of those the most popular file extensions were:
30462 .pdf 4210 .jpg 3688 .docx 1939 .png 1773 .ics 1339 .xlsx 1009 .txt 725 .html 682 .doc 640 .zip
If I reprocess the data to count how many messages had any particular type of file attachment, the data breaks down this way:
23789 .pdf 3177 .docx 3075 .jpg 1757 .ics 1221 .png 1172 .xlsx 744 .txt 690 .html 629 .asc 602 .zip 595 .doc
It is probably not surprising that the image formats drop in this re-ranking, since it's likely common to attach several images to a single message. To my surprise, a number of messages had multiple .zip file attachments, which is why the .zip numbers drop. Multiple .doc and .docx attachments are relatively common.
(In the 'things that make me raise my eyebrows now that I'm looking at them' category, there was one message with 24 .wmz attachments. It came from a 'marketing@<domain>' address, so maybe it was genuine and just, well, marketing.)
Basically all of these file types are unsurprising in our environment (academic computer science). All of the .asc files are PGP stuff (and have appropriate MIME types); I'm a bit surprised that we see so much of it in our email, but then some of this email is things like update notifications from Ubuntu and other sources that's PGP signed. Use of .p7s is not too much below the use of .asc, at 588 attachments. I am a bit surprise to see so many .html attachments, but perhaps some of that is mail sending programs improperly marking HTML parts as attachments instead of inline content.
Nothing particularly stands out about the contents of .zip files and ZIP archives in general, so I'm going to skip any extensive analysis or discussion of them.
At this point it's useful to cross-compare some suspicious file types from yesterday that haven't already been mentioned to see how many legitimate versions of them we see:
444 .xls 18 .rar 1 .iso 1 .docm
We clearly can't reject .xls file attachments, but it seems likely we could reject .docm and .iso attachments. I was going to say that we could probably reject .rar file attachments as well, but then I took a second look at our data. We could read the RAR file list for all but four of those .rar attachments, and all of the file types in them look legitimate; on closer inspection (eg of source and destination information), even the remaining four look good. It looks like some people just prefer RAR to ZIP, which I can't blame them for.
(The good news version of this finding is that our commercial anti-spam system is apparently very good at finding bad stuff in .rars, since no bad ones seem to have slipped past it.)
The types of attachments we see in malware email (March 2019 edition)
Back in mid 2017 I wrote about the types of attachments we saw then in malware-laden email. Today, for reasons beyond the scope of this entry, I feel like looking at our current numbers on this, based on the previous ten weeks of activity. This does not include the slowly but steadily growing collection of attachment types we reject immediately, but it does include 'malware' that is a phish spam in an actual attachment, because that's what our commercial anti-spam system does. As we will see, this is actually a large category of what we detect as 'malware'.
Over 99% of the detected malware attachments had MIME filenames. Out of the 5622 attachments with filenames, the most common file extensions were:
3008 .html 1134 .doc 536 .xlsx 246 .rar 245 .iso 60 .docm 58 .txt 57 .docx 44 .zip 36 .xls
More than half of these attachments were in messages detected as phish (more or less 55%, as it turns out). However, not all of the phish spam used .html attachments, or at least not directly; instead, it breaks down like this:
3008 MIME file ext: .html 58 MIME file ext: .txt 23 MIME file ext: .zip 6 MIME file ext: .jpg 3 MIME file ext: .png 1 MIME file ext: .htm
All of those .zip attachments actually contain a single .html file. We've seen this sort of single file ZIP smuggling before (1, 2) and now reject it outright for certain file types. We probably don't want to extend that to .html files, but it's slightly tempting.
Out of all of the various things that detect as ZIP archives (which is a lot more than .zip file attachments), there is no particularly dominating set of contents. We do see a certain number of ZIP archives that contain just a single .jar or a .jar plus a .txt, but the absolute numbers are too low to consider a 'reject on sight' policy for them (especially as our users may actually want to get .jars every so often).
My overall conclusion from this is that we don't really have any additional smoking gun file attachment types that we could argue for automatically rejecting on sight. We could raise the argument for .rar and .iso, but they are only 4% or so of the attachments in general. Anyway, this is only half the story; to really ask this question, we need to look at what sort of legitimate attachments our users get and that's another entry.
(Some but not very many messages detected with malware had multiple attachments. I'm not currently interested enough to do a breakdown of what types those messages use. For our purposes, any 'bad' file type that's commonly seen in malware laden email is suspect regardless of whether or not it actually contained the malware.)
A piece of email malware that wanted to make sure we rejected it
Recently our system for logging email attachment type information recorded an interesting attachment:
attachment application/octet-stream; MIME file ext: .ace; zip exts: .exe
The .ace extension is for an old archive file format and today is mostly used by malware, possibly because tools to look inside ACE archives are less common for reasons you can read about on the Wikipedia page (see eg here). We see a certain amount of .ace attachments all of the time, and we've been rejecting them all for some time. However, this attachment is not actually an ACE archive; instead it's a ZIP archive with a single .exe inside it. Single .exes inside ZIP archives are also a pattern we see frequently and we've been rejecting them for even longer than we've been rejecting .ace attachments.
(We knew it was a ZIP archive because it had the right magic signature to be one; we look at basically everything just to see, because ZIP archives can be hiding out under all sorts of extensions. Real ACE archives don't get detected as ZIP archives, especially ones that we can analyze.)
The net result is that regardless of how we interpreted this attachment, we were going to reject it (and we did). I've got to be amused by a spammer who gives us multiple reasons to reject their work, not just a single one.
My obvious theory for what happened here is that the malware spammer got some spam campaigns and processes confused, effectively crossing the wires between an ACE-based campaign and a ZIP-based one. Maybe they run the same campaign with both archive formats to cover all the bases, or maybe they have different campaigns going on at once. Or maybe this is the fault of some spam infrastructure provider. Whatever the cause is, it amuses me.
PS: This turns out to not be the only case of this we've seen in the past year or so. Some of the old ones even had the MIME type of application/zip, so something in the sending infrastructure clearly knew they actually were ZIP archives.
Sidebar: Some details on the message, with an interesting DKIM failure
The message has the usual sort of sender and subject, and a MIME filename of 'Payment Slip.ace'. These days, fake invoices seem to be the going thing. The sending IP is a Digital Ocean server. The message had a DKIM signature but the signature failed validation for the interesting reason of 'invalid - syntax error in public key record'.
You see, the domain the spammers picked to forge is a parked domain, and it has a wildcard TXT record of 'v=spf1 a -all' (with a five minute TTL, which is polite of the domain parker). Wildcard 'nothing is an acceptable sending source' SPF records are not valid DKIM records, but then this domain clearly isn't supposed to generate any email to start with. The domain parker could have been even more thorough by also providing a null MX record, but I'll give them points for trying at least the SPF record.
The malware adding a DKIM signature that could not possibly validate is an interesting touch. Perhaps this is the inevitable end result of Bayesian filtering being applied to spam and then spammers figuring out what people's Bayesian filters are really basing their decisions on.
Even thinking about spam makes me angry
It isn't news to me that dealing with spam makes me irritated and angry. I resent the intrusion into my email, and then I resent the time I spend dealing with it, and in fact I resent its very existence. This is not a rational irritation and hatred; I viscerally dislike spam and people and organizations who spam me. Sensible people would resent spammers only for the time and effort they take to deal with, but I am angry all out of proportion with that.
(This anger is part of what pushes me to think about and try to design elaborate potential anti-spam measures, even when this isn't necessarily wise. It is not that I enjoy the challenge of it all or the like, it is that I want to frustrate spammers.)
What I've recently clued in to is that even thinking about spam often makes me angry, not merely dealing with it. Perhaps this shouldn't surprise me, since I know my reaction is a visceral one and just being reminded of things will set off that sort of reaction, but it kind of does. I am a happier person when I can spend as long as possible paying as little attention as possible to all things involving spam; the less I think of it at all, the better it is for me.
That sounds awfully abstract, so let me make it concrete. I have yet another case of Google being a spammer mailing list provider, and I considered writing it up for Wandering Thoughts. Then I realized that even thinking about it was making me grumpy and soaking in the situation for long enough to write an entry would be even worse, since I can't write an entry about a spam incident without having the spam incident on my mind for the entire time I write.
So, I have decided that I will probably not write that entry. I am angry about the spam and angry at Google and I would like to hold them up to the light (again), but it is not worth it. I would rather be non-angry. Since any reminder about Google's culpability will probably not help, it would also be sensible for me to entirely block email from Google to my spamtrap addresses so I'm completely unaware of any future cases.
It's possible that this will cause me to write less about spam in general on Wandering Thoughts, although I'm going to have to see about that. I lump sort of spam-related issues like DKIM and so on into my spam category, and I likely still have things to talk about there.
(DMARC as a whole is not necessarily an anti-spam feature. As commonly used, it may be more of an anti-phish one, although I'm not sure that works as well as you'd like. That's another entry, though.)
An odd MIME Content-Disposition or two
One of the things that our system for recording email attachment
logs is the MIME
Content-Disposition header, if it exists. In
theory there should be only three cases for this header; if it
exists, it should be either
attachment, and it might
not exist if the message doesn't have multiple MIME parts (because
then the implicit disposition is 'inline'). In practice, well, you
can guess what happens here.
The first thing that happens is that some number of MIME parts just
omit having a
Content-Disposition. This is probably legitimate
these days (I would have to read the MIME RFCs to know for sure,
and I'm not that interested). The more interesting thing is that
rarely, people put other values into their C-D headers.
The most normal alternate thing we've seen in C-D headers over the
past 60 weeks is the value '
csv'; all of the cases we've seen are
.csv files with the claimed MIME type of application/vnd.ms-excel.
Spot-checking a couple of such messages shows that they come from
ncbi.nlm.nih.gov, so I suspect that there's some system there for
sending out CSV files that does this.
We saw one case of '
attachement' (with an extra 'e' in there),
for a PDF file. It's possible this was malware, but it's also
possible it's some automated PDF-sending system that manually
constructs MIME messages and has gotten the spelling slightly off.
We also saw one case of '
related', for a
.ico file; again I
don't have clear enough signs to guess on malware versus not.
However the case that drove me to write this entry is that last week we had a burst of 14 messages, all with the very special Content-Disposition of:
All 14 of these were identified by our commercial anti-spam system as Exp/20180802-B, which we've seen before. The base-64 Content-Disposition decodes into something that ends in .xlsx, and indeed the attachment was an application/xml ZIP archive with the same cluster of internal file extensions:
zip exts: .bin .png .rels .vml .xml none
Contrary to what I sort of expected, it turns out that these messages are nont single MIME parts but are instead multipart/mixed. Presumably they were directly crafted by something that made a little mistake with what went into the Content-Disposition field, but still managed to sort of properly encode it.
Looking back, over the past 60 weeks we've also seen what look like some other coding mistakes, for example some Content-Dispositions of:
(These two messages were detected as CXmail/MalPE-AC.)
This looks like someone passed the disposition plus the MIME filename to a function designed to encode the disposition alone, which did the best it could under the circumstances. We also saw a third that did the same but with a different filename.
As a side note, '
attachment' is by far the most common
Content-Disposition over the past 60 weeks, amounting to about 96.3%
of the MIME parts we see. In second place is '
inline', with about
2.3%, and then no Content-Disposition header, at 1.3%. Interestingly,
the most common '
inline' file type is PDFs, at 73%, followed by
.jpg at 6.7%. I'm surprised that PDFs are so high here, because
I wouldn't have thought that they were things mail sending programs
ask to be viewed inline.
(A random check on some PDFs I've been sent in email didn't turn up
any marked as '