Wandering Thoughts archives

2010-10-16

The attraction of the milter protocol

I've already touched on my interest in milter programs, but one might sensibly wonder why the milter protocol is of so much interest to a lot of people. Many mailers have had support for plugins and user hooks for some time, and if you want general filtering you can always write it using SMTP (either as a proxy or as a general relaying agent; a proxy is simpler, since you don't have to worry about persisting anything on the disk).

At one level, the milter protocol was just in the right place at the right time; any roughly equivalent protocol could have caught on. But that begs the question; what about the milter protocol makes it the right solution to the problem, especially given SMTP proxying?

My answer is that the milter protocol is explicitly synchronous. Why this matters requires some elaboration.

If you want to reject during the initial SMTP conversation with the sender (and you do), you need to be doing filtering during SMTP. Because you need go/no go answers from your filter before you reply to the sender's SMTP commands, this means that both your communication with the filter and the filter's own processing must be synchronous. In theory you can do this with SMTP-based communication with a filter that acts as a filtering SMTP proxy. In practice there are at least two complications; it makes your environment a lot more complex, and basically all MTAs are SMTP relays and are not designed to also act as SMTP proxies (or to sit behind them).

(The alternate is for the filter to be an SMTP proxy to your real MTA and be directly exposed to the Internet, but this is generally undesirable for various reasons and it certainly makes the filter itself more complicated.)

The milter API and protocol was created from the start as a simple synchronous protocol (and one that mapped well to SMTP commands). It intrinsically works the way you want, both for a filter author and for a MTA author, and in the process it gives you clear answers about how both sides are supposed to work with each other. Yes, any MTA that wants to use milters needs to add some code, but the MTA would almost certainly need more code to adapt it so that it would work as an SMTP proxy for synchronous rejections using a SMTP-based filter.

So the simple answer is that the milter protocol is the right solution to the problem of detecting and rejecting spam (instead of bouncing it or discarding it); in fact, it's practically the only general solution. Although SMTP proxying can be used to filter mail, it doesn't work very well (if at all) to reject email during SMTP conversations.

(From one perspective the milter protocol is a terrible protocol. It's not officially documented, it's full of odd binary structures, and its messages are inconsistently formatted; for example, there is a message where some message fields are present or absent based on the value of earlier message fields. You practically need a parser to decode it.)

Sidebar: the attractions of filtering in external programs

Working in a separate process has a number of obvious advantages to both the MTA authors and the filter authors. On the MTA side, it means that you don't have to dynamically load code, figure out how to embed some scripting language interpreter, create your own language (not that this has stopped some people), design and provide an API (or even an ABI), or worry about what happens to the main MTA when some unimportant filter code has bugs or explodes. On the filter author side, it means that you don't have to deal with any of that stuff either, development and debugging is easier, and you get to write in your choice of language without huge amounts of work.

Once you have a generic communication protocol it also means that you get a write once, run on many MTAs environment. (This is true just as much for SMTP-based filters as for milter-based ones.)

WhyMilters written at 03:06:03; Add Comment

2010-10-09

An illustration of how careful and clever spammers are today

I recently found an interesting illustration of how clever and dedicated modern blog spammers are. The spammer in question had (it appears) found a vulnerable Wordpress-based blog and compromised it, but not in the usual straightforward and obvious way; instead, they had opted to be much less obvious about it.

The website acted like this:

  • If you directly visited the site with Firefox (and likely Chrome, IE, Safari, Opera, or any other mainline browser) everything you saw was normal and just what you expected. I assume that the non-public portions (eg the WordPress admin interfaces) are also completely normal.

  • If you visited the website with something else, such as wget or lynx or, crucially, search engine crawlers, what you saw had typical pharmacy spam terms stuffed in as page titles, article titles, and HTML meta-keywords. The actual article text seemed unaffected.

    This substitution (and de-substitution for mainline browsers) was sufficiently thorough to also get the syndication feed; if you pulled it with wget it was pharmacy-ied, and if you looked at it in Firefox it wasn't.

    As a result of being visible to search engine crawlers, the pharmacy spam terms turned up in Google's and Bing's clip summary of the site and were visible in its cached pages.

  • If you came to the site from a search engine and you were using Firefox (and probably any of the mainline browsers), you immediately got redirected to an online pharmacy site. I assume that sending visitors to this is the spammer's real, ultimate goal.

This seems clearly designed to avoid tipping off the blog's owner and its regular visitors and users to the compromise; they would see everything normally and it would all look like business as usual. Only people from search engines would be redirected, ie the people least likely to have regular contact with the blog and be in a position to report things (or even to have any interest in reporting things, as opposed to thinking that they'd been taken by a scam site that had fooled Google).

The only reason that I discovered this is that I was using Google to find the site again. Because of how I manage browser history I knew that Google had found the right site for me, so I was very disconcerted to find myself abruptly on a spam pharmacy site and knew that something had gone badly wrong somewhere. Without the positive knowledge that this was the right site, I'd have written this off as a spammer hijacking Google search terms or the like.

(Because of the specific circumstances, I'm sure that this is a legitimate site and almost completely sure that the blog's owner is not in on this. For obvious reasons I'm not linking to the site or giving you enough information to find it in search engines; the compromise is ongoing as I write this entry, and for all I know the pharmacy site is also loaded with malware.)

I find it both interesting and disturbing that spammers are doing compromises that are this sophisticated. Since this is a Wordpress blog, this is probably a canned exploit and payload, but still, someone had to develop it, fully weaponize it, and probably make it easy for people to use. (And I imagine that there is a marketplace involved, too, with people selling compromised blogs that are ready to host the content of your choice and so on.)

CarefulBlogCompromise written at 01:45:24; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.