(Maybe) copying email anti-spam measures from Google and company

November 1, 2022

For a while now, Google has been rejecting some messages we try to forward to GMail with a SMTP error message like this:

Messages missing a valid messageId header are not accepted.

You can have a number of reactions to this. One of them is to be grumpy that Google is rejecting email that's otherwise (probably) perfectly valid and perhaps not even spam. Well, let's be honest here; all competent modern mail system operators reject email at SMTP time for all sorts of peculiar reasons, so I can hardly pick on GMail for not liking messages without message IDs when we will reject your messages if they an attachment type we don't like or ClamAV matches a signature.

Another reaction, one that I'm more and more leaning toward, is to consider making our email system reject external email at SMTP time for the same reason. Why? Because if GMail is doing it, a missing (or invalid) message ID is probably a good sign of spam. The people running GMail don't just roll out of bed one day, pick an RFC header requirement at random, and start rejecting email that violates it. Instead it seems very likely that they have a bunch of data that shows that rejecting email this way is a good idea.

(Of course we don't actually know if GMail is rejecting the email for this reason alone. There could be other signals involved that GMail isn't putting in the SMTP rejection message for various reasons.)

More broadly, I'm increasingly coming to think that major email providers have a lot more data on spam signs than we do, so we might as well take advantage of their work when possible. If they give us a relatively clear signal that they consider something a spam signature, maybe we should use that signal ourselves. At the very least it's probably worth investigating, for example to see how many messages have invalid or outright missing message IDs, and what happens to them.

(It's possible that rspamd can already recognize and log bad or missing message-ids, but if so I can't find it in the documentation on a casual search.)


Comments on this page:

I would note two things here

1) I've seen many MTAs in the past that inserted missing (MessageID) headers if they were missing (e.g. postfix < 2.6). Gmail used to do that, but that's obviously over now.

2) Gmail used to accept everything no matter how bad it was, it never ever rejected anything at smtp level. I suppose that now is the time they've matured enough, they've gathered all the data in the world they need, and feel it's no longer useful to them to store these. They're probably reached a tripping point where rejecting these has more benefits to them than accepting them.

By sam at 2022-11-02 05:28:21:

I wouldn't think that, if Google are explicitly saying why they're rejecting a message, then that reason isn't a significant spam signal. It's true that spam is a very low-margin 'business' and that spammers will put in the absolute least effort that they possibly can, but it's also true that Google et al go to great lengths to avoid disclosing what causes a message to be marked as spam so that spammers can't learn that and adapt.

By Andy Balholm at 2022-11-02 13:34:25:

Rspamd has the MISSING_MID rule that matches messages with no Message-ID. It has a score of 2.5 points by default.

By Opk at 2022-11-05 19:06:37:

I came across this recently on a mailing list server I look after for an open source project. When I went through logs, invalid message IDs is not something that corresponded to spammers so much as a couple of e-mail self-hosters who had configured their message IDs without an @. I increased the rspamd scoring on it mainly because gmail was bouncing them and I prefer to protect our IP reputation. Gmail accounts for nearly half the subscribers anyway.

We'll never know but my assumption was that gmail were rejecting it because they do want to effect better RFC compliance. There seems to be a wider trend in the industry away from the traditional robustness principle because writing code that is liberal in what it accepts is hard to get right and opens you up to mistakes that leave security holes.

Written on 01 November 2022.
« I wish ZFS supported per-user reservations, not just per-user quotas
The problem of getting problem reports from (our) people »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Nov 1 22:42:25 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.