Why I want something like Procmail with a dedicated mail filtering language

September 2, 2020

A couple of years ago I wrote about discovering that procmail development is basically dead and wondering out loud what I might switch to. In some comments on that entry, Aristotle Pagaltzis suggested that in an environment (such as MH) with one message per file, well, let me quote:

[...], then you can write yourself one or more programs in your favourite language that kick the mail from there to wherever you want it to end up. The entirety of the job of such code is opening and reading files and then moving them, for which any language whatsoever will do, so the only concern is how far you want to library up your mail parsing.

My reply (in another comment on that entry) was that I wanted a system where I directly wrote mail filtering rules, as is the case in procmail, not a system where I wrote filtering rules in some general purpose programming language. But I never explained why I wanted a special purpose language for this.

My reason for this is that writing mail filtering in a special purpose language removes (or rather hides) all of the plumbing that is otherwise necessary. The result may have obscure syntax (procmail certainly does), but almost everything it says is about what mail filtering is happening, not the structure of getting it to happen (both at the large scale level of opening files, parsing them, moving them around, and at the small scale level of executing or otherwise matching rules). This makes it much easier to come back later to pull out 'what is this filtering' from the system; the configuration file you read is all about that. With a general purpose programming language, coming back in six months or a year requires essentially reverse engineering your entire program, because you have to find the filtering rules in the rest of the code (and understand how they're executed).

(In theory you can avoid some of this if you write good enough documentation for your personal filtering setup. In practice it's pretty unlikely that you will, or that this documentation will be well tested enough (because you need to test documentation). An open source mail filtering system with a dedicated filtering language is much more likely to have good documentation that lets you drop right into understanding your filtering rules again.)

This is a subtle advantage of DSLs (Domain Specific Languages) in general. In a good DSL, much like with wikitext, almost everything you write is real 'content' (here, real filtering rules), and very little of it is scaffolding. A general purpose language necessarily isn't that focused on your specific problem area, and so making it focus that way requires a bunch of scaffolding. At the extreme, you wind up building your own language that's implemented in the general purpose language.

(This may be literal, with a parser and everything, or it may be in the form of a set of stylized and standard function calls or method calls you make to embody your real work.)


Comments on this page:

I use maildrop (https://www.courier-mta.org/maildropfilter.html) which is reasonably unobscure and powerful enough, and can be ran just like procmail upon delivery.

By mario at 2020-09-03 11:02:55:

I may be missing some of your context, but... what's wrong with using Sieve?

To echo mario's comment, I've never understood why sieve, which as I understand it is the RFC standard for mail filtering, is not the de facto standard. Everybody rolls their own. My hosting company, which also provides my email (since I won't touch what my ISP provides for various reasons), insists on building filter rules by hand in their web interface; to describe that as "clunky" given the number of filter rules I have would be an insult to clunkiness.

By Andrew at 2020-09-03 21:42:24:

Sieve is deliberately kind of low-powered because it's the standard for "giving your ISP some rules to run on their mailserver" (even if depressingly few of them support it). procmail assumes you're running it on your own machine, or at least a machine where you've been given a shell account (and the level of trust that entailed in 1990, when procmail was written). So it allows pretty much arbitrary nested logic, allows forwarding rules, has an option that sends the message down multiple processing paths by actually forking procmail, and, as Chris mentioned in the article he linked, allows filtering through and delivering to external programs. Basically it's tailored to unix nerds.

It may not have been evident from my follow-up comment on that entry, but I agreed with this point even then. I did in fact create what you then called a canned environment, whereby I write mail filtering rules in a configuration language I designed for myself. My contemporary feelings about my invention are… nuanced, shall we say.

But my feelings about procmail have not changed. Firstly (though not really relevant to this entry), I never wanted to use another mail filtering system that processes mail in flight – and I still feel that way. Secondly (and actually on topic), my dislike for procmail was and is not because it offers a dedicated language for filtering mail, but because for everything but the simplest filtering tasks, it is essentially a Turing tarpit. And I don’t want to be dealing with those, except maybe for fun.

The bottom line is that I’m still convinced of the approach, but so far also haven’t seen an implementation that I like or would advocate, my own included. As for the point you made here, I don’t disagree or quibble at all.

By James (trs80) at 2020-09-10 09:55:56:

I read your linked entry, and sure enough I mentioned Sieve then, because it is an email-filtering-specific DSL. There's decent client support, including most relevantly for you, GNU mailutils which supports mh.

The main disadvantage of sieve that it's flexible enough to be write-only in practice, eg RoundCube and SOGo both support it and generate it from a web interface but will ignore/disable the other's sieve scripts when encountered since they can't parse them.

Written on 02 September 2020.
« Even in Go, concurrency is still not easy (with an example)
In practice, cool URLs change (eventually) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Sep 2 23:46:24 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.