Wandering Thoughts archives

2012-05-24

Today's Mercurial command alias: a short form hg incoming

Like other modern VCSes, Mercurial allows you to define command aliases. This is a feature that I don't use as much as I could, but every so often I work out one that's really handy. Today's alias is something I've called 'hg pending', a short form summary version of 'hg incoming'.

From my .hgrc:

[alias]
pending = incoming --template '* {desc|firstline}\n'

This shows something that looks like:

* exp/html: detect "integration points" in SVG and MathML content
* runtime: faster GC mark phase
* cmd/cc: fix uint right shift in constant evaluation
* cmd/6g: peephole fixes/additions

When I'm checking up on what's coming from an upstream repository that I'm tracking, this is basically just what I want; I don't care about who authored the change or its long description or the other full information, I'd just like to get a quick overview of what I'll get with a pull.

(This assumes that you are tracking Mercurial repositories that use the decent and correct format for commit messages.)

Possibly I should make a version of this for hg log too, but I haven't felt the urge yet.

(I have no strong reason for calling it 'pending' instead of anything else and there is probably a better term for it since I am bad with names.)

Bonus: sdiff, a sysadmin-friendly diff alias

Another Mercurial 'alias' (sort of) that we've become very accustomed to is something we call hg sdiff:

[extensions]
hgext.extdiff =

[extdiff]
cmd.sdiff = diff
opts.sdiff = -Nr

This gives us an old-style diff of changes. This is not something you would use as a programmer, but for sysadmins it turns out that the context from unified diffs is basically pointless for many changes to configuration files. In fact, zero-context diffs can be easier to read because there is less clutter obscuring the actual changes.

(The one drawback of this is that Mercurial doesn't report the file that has changed if there's only one. 'hg status' is your friend here.)

This especially matters to us because we often record configuration file changes (in some diff form) in worklog entries (which also helps other sysadmins to stay on top of the changes). When we initially switched to Mercurial from RCS, the increased verbosity of the default hg diff unified diff output became vaguely annoying. Creating hg sdiff helped significantly.

(Having written this, I guess we could experiment with forcing zero lines of context for unified diffs. However we're well accustomed to old-style diff output as it is, so I suspect we wouldn't really find it much of an improvement.)

programming/HgPendingAlias written at 17:11:24; Add Comment

How we do milter-based spam rejection with Exim

Suppose that you use Exim as your mailer, and you want to do SMTP-time rejection of incoming spam using some outside program that has a sendmail milter protocol. Exim has no native support for the milter protocol, but it is possible to hijack some existing Exim interfaces to more or less achieve this provided that you don't want to try to change the message in flight at this point (only either accept it as is or reject it).

Exim has a content-scanning interface; one of the things it can do is run an external program as an anti-virus scanner (the av_scanner scanner type of cmdline). If you enable things in the acl_smtp_data ACL, the program you run here can signal Exim to reject the message and provide a relatively arbitrary message that Exim can put in the SMTP rejection. Since Exim documents this as an interface for detecting viruses all of the examples talk about things like malware names, but you can use it for anything you want.

A simplified version of our setting for this looks like:

av_scanner = cmdline:/milter/eximdriver.py %s:^REJECT :^REJECT (.+)

Then our DATA ACL contains a stanza with:

deny
  malware = *
  message = Rejected: $malware_name

If eximdriver.py outputs a string that looks like 'REJECT some-reason', Exim will declare that the message contains malware and set $malware_name to the some-reason portion, which we use directly in the SMTP rejection message.

Eximdriver.py has three important pieces. The first is a client side library for the milter protocol, so that it can actually talk to the milter server and get results back. The second is code to load a message from the .eml spool format that Exim writes them to for the AV scanner program; this is basically a standard 'mailbox' format mail message augmented with some special Exim headers. The complication in one's life is that you need to recover the SMTP envelope information from various message headers, including the first Received line.

(You might think that the envelope information could be passed on the command line. Unfortunately not securely. Also, note that the %s in the argument here is not the .eml file itself but the directory it's in. Presumably Exim sets things up this way so that real AV scanners have a per-message directory where they can write whatever temporary files they need.)

The third chunk of eximdriver.py is site-dependent; it interprets the result of the remote milter in order to figure out whether Exim should be told to reject the message and if so, with what reasons. For reasons beyond the scope of this entry, our milter server doesn't give us direct answers on this; instead, it tells us about changes that should be made to the message. Our eximdriver.py reverse engineers these changes back to whether or not the message has a virus and how high a spam score it got and outputs appropriate messages.

As crazy as this may sound, all of this actually works and works fine. It is nowhere near as efficient as direct milter support in Exim would be (we are at least potentially running a Python program for every incoming email message), but our external mail gateway is actually relatively overconfigured for our volume and it's never been a problem for us.

(There is another mitigating factor, which is about to be discussed.)

Our eximdriver.py also does some associated things, like syslog detailed information about rejected messages and some information about accepted ones.

As a side note, it is very deliberate that eximdriver.py must produce output with a specific prefix in order to have email rejected. This is much safer than having any output at all cause message rejection, because it means that if something goes wrong in eximdriver.py (or just with running it, perhaps because of machine overload) we fail safely; we default to accepting the email instead of bouncing it.

(I was going to say that Exim has no easy way for the 'AV scanner' to signal that the mail should be temporarily deferred with a SMTP 4xx, but actually that's wrong. Exim is explicitly documented to default to deferring messages if there's a visible AV scanner problem, and you can use a regular-expression based 'malware = ...' condition in a defer ACL stanza in order to defer based on the milter results as signaled back to Exim in the 'malware name'.)

Sidebar: the gory details

Now I have to confess that this is the simplified version of our configuration, because we have a problem: not all of our users have opted in to SMTP-time rejection of spam messages. In fact, some of our users have opted in to different levels of SMTP-time rejections. This leads to the SMTP DATA problem; because what we say at DATA time applies to all accepted RCPT TO addresses, we have to use the least aggressive level of SMTP-time rejection that everyone has agreed to (possibly right down to 'no SMTP-time rejection'). In turn this means that our Exim configuration has to keep track of this on the fly and eximdriver.py then gets invoked with this level as one of its arguments and only emits REJECT notices if the message qualifies.

Because the current scanning level is a dynamic expansion that must be re-done every time av_scanner is evaluated, our actual av_scanner setting has to look like this:

av_scanner = ${if bool{true} {cmdline:/milter/eximdriver.py -l SCANLVL %s: ^REJECT :^REJECT (.+)}}

The pointless ${if bool{true} ...} portion causes Exim to re-expand this every time it is used so that the current message's scanning level is substituted in.

The simplest way to track the scanning level turned out to be keeping a list of each address's scan level in an ACL variable, $acl_m0_milter. As we handle each address in the RCPT TO ACL, it appends the address's scanning level to the variable (which is initialized to the maximum scan level, currently 3) with an expression like 'set acl_m0_milter = $acl_m0_milter:2' in an otherwise do-nothing warn ACL stanza. The SCANLVL definition reduces this down to the lowest level seen:

SCANLVL = ${reduce {$acl_m0_milter} {3} {${if <{$item}{$value} {$item}{$value}}}}

As an efficiency measure we do not bother doing scanning at all if one or more of the RCPT TO addresses has not opted in to any SMTP-time rejection. This is actually our common case; for various reasons, relatively few people have opted in to any of our server side anti-spam options. This is done with a condition on the 'malware' deny stanza:

condition = ${if >{SCANLVL}{0}{true}{false}}

The result of all of this actually works well, believe it or not.

(Our approach does require that we can put people's anti-spam choices into a linear ordering of some sort. Extensions to schemes involving bitmaps of what people have opted in to are left as an exercise to the dedicated and perverse Exim configuration file writer.)

sysadmin/EximMilterHookup written at 02:30:44; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.