2012-05-24
Today's Mercurial command alias: a short form hg incoming
Like other modern VCSes, Mercurial
allows you to define command aliases. This is a feature that I don't
use as much as I could, but every so often I work out one that's really
handy. Today's alias is something I've called 'hg pending
', a short form
summary version of 'hg incoming
'.
From my .hgrc:
[alias]
pending = incoming --template '* {desc|firstline}\n'
This shows something that looks like:
* exp/html: detect "integration points" in SVG and MathML content
* runtime: faster GC mark phase
* cmd/cc: fix uint right shift in constant evaluation
* cmd/6g: peephole fixes/additions
When I'm checking up on what's coming from an upstream repository that I'm tracking, this is basically just what I want; I don't care about who authored the change or its long description or the other full information, I'd just like to get a quick overview of what I'll get with a pull.
(This assumes that you are tracking Mercurial repositories that use the decent and correct format for commit messages.)
Possibly I should make a version of this for hg log
too, but I
haven't felt the urge yet.
(I have no strong reason for calling it 'pending' instead of anything else and there is probably a better term for it since I am bad with names.)
Bonus: sdiff
, a sysadmin-friendly diff alias
Another Mercurial 'alias' (sort of) that we've become very accustomed
to is something we call hg sdiff
:
[extensions]
hgext.extdiff =
[extdiff]
cmd.sdiff = diff
opts.sdiff = -Nr
This gives us an old-style diff of changes. This is not something you would use as a programmer, but for sysadmins it turns out that the context from unified diffs is basically pointless for many changes to configuration files. In fact, zero-context diffs can be easier to read because there is less clutter obscuring the actual changes.
(The one drawback of this is that Mercurial doesn't report the file
that has changed if there's only one. 'hg status
' is your friend
here.)
This especially matters to us because we often record configuration
file changes (in some diff form) in worklog entries (which also helps other sysadmins to
stay on top of the changes). When we
initially switched to Mercurial from RCS,
the increased verbosity of the default hg diff
unified diff output
became vaguely annoying. Creating hg sdiff
helped significantly.
(Having written this, I guess we could experiment with forcing zero lines of context for unified diffs. However we're well accustomed to old-style diff output as it is, so I suspect we wouldn't really find it much of an improvement.)
How we do milter-based spam rejection with Exim
Suppose that you use Exim as your mailer, and you want to do SMTP-time rejection of incoming spam using some outside program that has a sendmail milter protocol. Exim has no native support for the milter protocol, but it is possible to hijack some existing Exim interfaces to more or less achieve this provided that you don't want to try to change the message in flight at this point (only either accept it as is or reject it).
Exim has a content-scanning interface;
one of the things it can do is run an external program as an anti-virus
scanner (the av_scanner
scanner type of cmdline
). If you enable
things in the acl_smtp_data
ACL, the program you run here can signal
Exim to reject the message and provide a relatively arbitrary message
that Exim can put in the SMTP rejection. Since Exim documents this as
an interface for detecting viruses all of the examples talk about things
like malware names, but you can use it for anything you want.
A simplified version of our setting for this looks like:
av_scanner = cmdline:/milter/eximdriver.py %s:^REJECT :^REJECT (.+)
Then our DATA
ACL contains a stanza with:
deny malware = * message = Rejected: $malware_name
If eximdriver.py outputs a string that looks like 'REJECT some-reason',
Exim will declare that the message contains malware and set
$malware_name
to the some-reason portion, which we use directly in
the SMTP rejection message.
Eximdriver.py has three important pieces. The first is a client side
library for the milter protocol, so that
it can actually talk to the milter server and get results back. The
second is code to load a message from the .eml spool format that Exim
writes them to for the AV scanner program; this is basically a standard
'mailbox' format mail message augmented with some special Exim headers.
The complication in one's life is that you need to recover the SMTP
envelope information from various message headers, including the first
Received
line.
(You might think that the envelope information could be
passed on the command line. Unfortunately not securely. Also, note that the %s
in the
argument here is not the .eml file itself but the directory it's
in. Presumably Exim sets things up this way so that real AV scanners
have a per-message directory where they can write whatever temporary
files they need.)
The third chunk of eximdriver.py is site-dependent; it interprets the result of the remote milter in order to figure out whether Exim should be told to reject the message and if so, with what reasons. For reasons beyond the scope of this entry, our milter server doesn't give us direct answers on this; instead, it tells us about changes that should be made to the message. Our eximdriver.py reverse engineers these changes back to whether or not the message has a virus and how high a spam score it got and outputs appropriate messages.
As crazy as this may sound, all of this actually works and works fine. It is nowhere near as efficient as direct milter support in Exim would be (we are at least potentially running a Python program for every incoming email message), but our external mail gateway is actually relatively overconfigured for our volume and it's never been a problem for us.
(There is another mitigating factor, which is about to be discussed.)
Our eximdriver.py also does some associated things, like syslog detailed information about rejected messages and some information about accepted ones.
As a side note, it is very deliberate that eximdriver.py must produce output with a specific prefix in order to have email rejected. This is much safer than having any output at all cause message rejection, because it means that if something goes wrong in eximdriver.py (or just with running it, perhaps because of machine overload) we fail safely; we default to accepting the email instead of bouncing it.
(I was going to say that Exim has no easy way for the 'AV scanner' to
signal that the mail should be temporarily deferred with a SMTP 4xx,
but actually that's wrong. Exim is explicitly documented to default to
deferring messages if there's a visible AV scanner problem, and you can
use a regular-expression based 'malware = ...
' condition in a defer
ACL stanza in order to defer based on the milter results as signaled
back to Exim in the 'malware name'.)
Sidebar: the gory details
Now I have to confess that this is the simplified version of our
configuration, because we have a problem: not all of our users have
opted in to SMTP-time rejection of spam messages. In fact, some of our
users have opted in to different levels of SMTP-time rejections. This
leads to the SMTP DATA
problem;
because what we say at DATA
time applies to all accepted RCPT TO
addresses, we have to use the least aggressive level of SMTP-time
rejection that everyone has agreed to (possibly right down to 'no
SMTP-time rejection'). In turn this means that our Exim configuration
has to keep track of this on the fly and eximdriver.py then gets invoked
with this level as one of its arguments and only emits REJECT
notices
if the message qualifies.
Because the current scanning level is a dynamic expansion that
must be re-done every time av_scanner
is evaluated, our actual
av_scanner
setting has to look like this:
av_scanner = ${if bool{true} {cmdline:/milter/eximdriver.py -l SCANLVL %s: ^REJECT :^REJECT (.+)}}
The pointless ${if bool{true} ...}
portion causes Exim to re-expand
this every time it is used so that the current message's scanning level
is substituted in.
The simplest way to track the scanning level turned out to be
keeping a list of each address's scan level in an ACL variable,
$acl_m0_milter
. As we handle each address in the RCPT TO
ACL,
it appends the address's scanning level to the variable (which is
initialized to the maximum scan level, currently 3) with an expression
like 'set acl_m0_milter = $acl_m0_milter:2
' in an otherwise
do-nothing warn
ACL stanza. The SCANLVL
definition reduces this
down to the lowest level seen:
SCANLVL = ${reduce {$acl_m0_milter} {3} {${if <{$item}{$value} {$item}{$value}}}}
As an efficiency measure we do not bother doing scanning at all if one
or more of the RCPT TO
addresses has not opted in to any SMTP-time
rejection. This is actually our common case; for various reasons,
relatively few people have opted in to any of our server side anti-spam
options. This is done with a condition on the 'malware' deny
stanza:
condition = ${if >{SCANLVL}{0}{true}{false}}
The result of all of this actually works well, believe it or not.
(Our approach does require that we can put people's anti-spam choices into a linear ordering of some sort. Extensions to schemes involving bitmaps of what people have opted in to are left as an exercise to the dedicated and perverse Exim configuration file writer.)