Wandering Thoughts archives

2006-04-07

A bash irritation

Presented in illustrated form:

; cat foo
#!/bin/sh
cat "$@"
; ./foo /etc/termcap | sed 1q >/dev/null
./foo: line 2: 3632 Broken pipe  cat "$@"

This behavior is new in Bash 2, as far as I know. I find it extremely irritating, because programs having broken pipes is perfectly normal in Unix. Many filters and so on don't consume all of their input, and no other implementation of the Bourne shell reports this (as far as I know).

(I would not mind so much if this was only reported for interactive sessions; it is that bash spews out this message in shell scripts that irritates me so much.)

As the BASH FAQ covers in its section E2, this can be disabled when people compile bash. Debian apparently usually compiles bash this way (good for them); Red Hat does not (sigh). Of course this solution only helps people who can recompile and reinstall bash on all of the systems they want their shell scripts to run nicely on.

You can muzzle much of the verbosity of this by adding a do-nothing trap for SIGPIPE. Unfortunately you can't get it all; the best you can do is:

; cat foo2
#!/bin/sh
trap ':' 13; cat "$@"; exit 0
; ./foo2 /etc/termcap | sed 1q >/dev/null
Broken pipe

The message comes from bash itself. If you trap signal 13 to nothing at all, bash will set SIGPIPE to 'ignored' (SIG_IGN) for all of the processes in the pipe, which will cause them to see write errors instead of dying when the pipe breaks, which gets you:

; cat foo3
#!/bin/sh
trap '' 13; cat "$@"; exit 0
; ./foo3 /etc/termcap | sed 1q >/dev/null
cat: write error: Broken pipe

Which error message you prefer is a matter of taste. I tend to go for the foo3 case, because at least then I can remember why I am getting these strange messages (and grind my teeth about it yet again).

linux/BashPipes written at 20:31:42; Add Comment

A pleasing Python regularity with __future__

The other night I was writing a Python program that wanted to divide two integers and get a floating point result. Normally integer division in Python produces integers, following a C style model of distinct numeric types; however, Python is slowly migrating towards a model where the types of numbers are more of an implementation detail.

The general way to get early access to an incompatible change like this is a magic statement at the start of your module: from __future__ import whatever. I knew that the change in number behavior could be gotten this way, but I couldn't remember what the magic whatever for it was.

After a moment's thought, I decided to try something:

>>> import __future__
>>> dir(__future__)

Despite the magic involved with __future__, this worked; I got a list of all of the magic stuff I could enable, and easily picked 'division' out as what I wanted.

It turns out that in addition to the magic in the CPython interpreter, there is a real __future__.py module. When you import it normally you get the regular module instead of the special magic interpreter handling, and get to introspect it and so on as usual.

And talking of special magic:

>>> from __future__ import braces
  File "<stdin>", line 1
SyntaxError: not a chance

(Other nonexistent future features get a different error message. And you specifically can't do 'from __future__ import *'.)

python/RegularFuture written at 16:06:02; Add Comment

Some quick SMTP connection statistics

Recently I've been wondering about the usage pattern of zombie machines. Do spammers typically make only a few connections from each zombie and move on, or do they use the same machines over and over?

Through my weekly spam stats I know that some machines that we reject at connection time try again and again. But what's the distribution like? For example, do most IP addresses get refused once or twice and then go away? So I grabbed our logs and started looking.

All of these figures are for the past 28 (full) days, and for IP addresses that have connected to us at least twice at least five seconds apart (so we're already dealing with machines with some retrying or reuse).

What Different IPs 1 try 2 tries 3 tries 4 tries 5-10 tries more
all refused 46,583 60% 17% 7.4% 4.1% 8% 3.7%
'dynamic' 25,430 59% 17% 7.7% 4.2% 8.2% 3.3%
bad reverse DNS 15,582 63% 17% 6.6% 3.4% 6.3% 3.3%
CBL 4,237 49% 19% 9% 6.2% 12% 4.3%

'CBL' is the people we rejected for being CBL listed. Unfortunately for my nice neat stats, we only check DNS blocklists after doing 30 minutes of greylisting (or more, for people with bad DNS information). So these are the creme of the crop of CBL listed IP addresses, which explains the relatively high persistence. It also makes the 49% 'only rejected once' interesting; I theorize that spammers are now using at least some zombie handling programs that don't give up after 4xx series SMTP replies, but do after 5xx ones.

At the moment, 7,511 of the 'bad reverse DNS' IP addresses and 11,518 of the 'dynamic' IP addresses are currently in the CBL (since the CBL ages things out, it's possible that more of them were originally there). Broken apart into 'in the CBL' and 'not currently in the CBL' sets, we get:

What Different IPs 1 try 2 tries 3 tries 4 tries 5-10 tries more
'CBL' 19,022 56% 18% 8.3% 4.7% 9.3% 4.2%
non-CBL 21,969 65% 17% 6.4% 3.2% 5.9% 2.5%

I don't have any really clever theories about the difference in persistence. It does make me want to move the CBL to early on in our processing so I can generate better numbers. (Prior experience suggests that most of our rejections will be in the CBL.)

spam/QuickConnectionStats written at 03:57:08; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.