Errors during SMTP conversations aren't trustworthy, illustrated

May 10, 2021

Recently we had a mail problem where we could not deliver email to a particular remote destination for a while. A major Australian ISP spent six days telling us:

421 4.7.25 Temporarily rejected. Reverse DNS for <our-IP> failed. IB108

(Based on Exim log messages, this happened during the initial SMTP connection, before we even EHLO'd.)

Then later the ISP was fine again, sadly after the person trying to send mail had their attempts time out and contacted us to see if we could do anything about it. The ISP was fine before this incident, and they've been fine ever since, and no other destination reported anything like this message to us.

We did not have malfunctioning nameservers or missing reverse DNS for six days. We did not, as far as we can tell, have DNS servers that the outside world had problems reaching for six days. I suppose it's possible that this large ISP had some internal problem that prevented their DNS servers from talking to our DNS servers for six days, but not so big that they noticed it and dealt with it right away. Alternately, perhaps this ISP was not being honest with us about why they decided not to accept connections from our outgoing email server. We can't tell.

(During the six day problem period, our user was able to reach their recipient on this ISP from some other places, both of which are big email heavyweights, so it was not an issue with the recipient or with the ISP's mail system in general.)

It's not really news or a new thing that the messages you get from other people's mail servers are not necessarily telling you the real reason that your messages aren't being accepted. Many of the major mail providers seem to do it; it's been a long time since I really believed GMail's SMTP time messages, for example. We have many cases where GMail will give temporary 4xx SMTP error codes for an email for a while with various claims in the SMTP error messages, then wind up accepting it. In other cases the 'temporary' 4xx error codes stick for as long as we want to keep retrying and we eventually time out the message.

(My personal lesson learned from this incident was that I should pay more attention to our queued email, then look into things that seemed odd. At the very least I might have been able to reproduce this outside of Exim, and test it from other IPs on the same subnet and elsewhere within the university.)

Written on 10 May 2021.
« DKMS built one of my kernel modules for the wrong kernel
Firefox and the challenge of trying to make visited links clearly visible »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon May 10 22:49:09 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.