(Maybe) copying email anti-spam measures from Google and company
For a while now, Google has been rejecting some messages we try to forward to GMail with a SMTP error message like this:
Messages missing a valid messageId header are not accepted.
You can have a number of reactions to this. One of them is to be grumpy that Google is rejecting email that's otherwise (probably) perfectly valid and perhaps not even spam. Well, let's be honest here; all competent modern mail system operators reject email at SMTP time for all sorts of peculiar reasons, so I can hardly pick on GMail for not liking messages without message IDs when we will reject your messages if they an attachment type we don't like or ClamAV matches a signature.
Another reaction, one that I'm more and more leaning toward, is to consider making our email system reject external email at SMTP time for the same reason. Why? Because if GMail is doing it, a missing (or invalid) message ID is probably a good sign of spam. The people running GMail don't just roll out of bed one day, pick an RFC header requirement at random, and start rejecting email that violates it. Instead it seems very likely that they have a bunch of data that shows that rejecting email this way is a good idea.
(Of course we don't actually know if GMail is rejecting the email for this reason alone. There could be other signals involved that GMail isn't putting in the SMTP rejection message for various reasons.)
More broadly, I'm increasingly coming to think that major email providers have a lot more data on spam signs than we do, so we might as well take advantage of their work when possible. If they give us a relatively clear signal that they consider something a spam signature, maybe we should use that signal ourselves. At the very least it's probably worth investigating, for example to see how many messages have invalid or outright missing message IDs, and what happens to them.
(It's possible that rspamd can already recognize and log bad or missing message-ids, but if so I can't find it in the documentation on a casual search.)