2006-04-07
A bash
irritation
Presented in illustrated form:
; cat foo #!/bin/sh cat "$@" ; ./foo /etc/termcap | sed 1q >/dev/null ./foo: line 2: 3632 Broken pipe cat "$@"
This behavior is new in Bash 2, as far as I know. I find it extremely irritating, because programs having broken pipes is perfectly normal in Unix. Many filters and so on don't consume all of their input, and no other implementation of the Bourne shell reports this (as far as I know).
(I would not mind so much if this was only reported for
interactive sessions; it is that bash
spews out this
message in shell scripts that irritates me so much.)
As the BASH FAQ covers in its section E2, this can be disabled when people compile bash. Debian apparently usually compiles bash this way (good for them); Red Hat does not (sigh). Of course this solution only helps people who can recompile and reinstall bash on all of the systems they want their shell scripts to run nicely on.
You can muzzle much of the verbosity of this by adding a do-nothing
trap
for SIGPIPE. Unfortunately you can't get it all; the
best you can do is:
; cat foo2 #!/bin/sh trap ':' 13; cat "$@"; exit 0 ; ./foo2 /etc/termcap | sed 1q >/dev/null Broken pipe
The message comes from bash
itself. If you trap
signal 13 to nothing
at all, bash will set SIGPIPE to 'ignored' (SIG_IGN
) for all of the
processes in the pipe, which will cause them to see write errors instead
of dying when the pipe breaks, which gets you:
; cat foo3 #!/bin/sh trap '' 13; cat "$@"; exit 0 ; ./foo3 /etc/termcap | sed 1q >/dev/null cat: write error: Broken pipe
Which error message you prefer is a matter of taste. I tend to
go for the foo3
case, because at least then I can remember why
I am getting these strange messages (and grind my teeth about it
yet again).
A pleasing Python regularity with __future__
The other night I was writing a Python program that wanted to divide two integers and get a floating point result. Normally integer division in Python produces integers, following a C style model of distinct numeric types; however, Python is slowly migrating towards a model where the types of numbers are more of an implementation detail.
The general way to get early access to an incompatible change like
this is a magic statement at the start of your module: from
__future__ import whatever
. I knew that the change in number
behavior could be gotten this way, but I couldn't remember what
the magic whatever for it was.
After a moment's thought, I decided to try something:
>>> import __future__
>>> dir(__future__)
Despite the magic involved with __future__, this worked; I got a
list of all of the magic stuff I could enable, and easily picked
'division
' out as what I wanted.
It turns out that in addition to the magic in the CPython interpreter,
there is a real __future__.py
module. When you import it normally
you get the regular module instead of the special magic interpreter
handling, and get to introspect it and so on as usual.
And talking of special magic:
>>> from __future__ import braces
File "<stdin>", line 1
SyntaxError: not a chance
(Other nonexistent future features get a different error message.
And you specifically can't do 'from __future__ import *
'.)
Some quick SMTP connection statistics
Recently I've been wondering about the usage pattern of zombie machines. Do spammers typically make only a few connections from each zombie and move on, or do they use the same machines over and over?
Through my weekly spam stats I know that some machines that we reject at connection time try again and again. But what's the distribution like? For example, do most IP addresses get refused once or twice and then go away? So I grabbed our logs and started looking.
All of these figures are for the past 28 (full) days, and for IP addresses that have connected to us at least twice at least five seconds apart (so we're already dealing with machines with some retrying or reuse).
What | Different IPs | 1 try | 2 tries | 3 tries | 4 tries | 5-10 tries | more |
all refused | 46,583 | 60% | 17% | 7.4% | 4.1% | 8% | 3.7% |
'dynamic' | 25,430 | 59% | 17% | 7.7% | 4.2% | 8.2% | 3.3% |
bad reverse DNS | 15,582 | 63% | 17% | 6.6% | 3.4% | 6.3% | 3.3% |
CBL | 4,237 | 49% | 19% | 9% | 6.2% | 12% | 4.3% |
'CBL' is the people we rejected for being CBL listed. Unfortunately for my nice neat stats, we only check DNS blocklists after doing 30 minutes of greylisting (or more, for people with bad DNS information). So these are the creme of the crop of CBL listed IP addresses, which explains the relatively high persistence. It also makes the 49% 'only rejected once' interesting; I theorize that spammers are now using at least some zombie handling programs that don't give up after 4xx series SMTP replies, but do after 5xx ones.
At the moment, 7,511 of the 'bad reverse DNS' IP addresses and 11,518 of the 'dynamic' IP addresses are currently in the CBL (since the CBL ages things out, it's possible that more of them were originally there). Broken apart into 'in the CBL' and 'not currently in the CBL' sets, we get:
What | Different IPs | 1 try | 2 tries | 3 tries | 4 tries | 5-10 tries | more |
'CBL' | 19,022 | 56% | 18% | 8.3% | 4.7% | 9.3% | 4.2% |
non-CBL | 21,969 | 65% | 17% | 6.4% | 3.2% | 5.9% | 2.5% |
I don't have any really clever theories about the difference in persistence. It does make me want to move the CBL to early on in our processing so I can generate better numbers. (Prior experience suggests that most of our rejections will be in the CBL.)