2007-03-04
Handling lines with something-separated fields for Python
As a system administrator, I spend a bunch of my time dealing with
files made up of lines that are composed of fields separated by some
character. A classical example is /etc/passwd, with colon-separated
fields. These file formats are ordered lists with named fields, which
should sound familiar, but they don't show
up as Python lists, they show up as lines of text and they want to
be output as text; that we use lists to represent them is just an
implementation detail.
This only takes a little bit of extra work to implement on top of
our previous SetMixin class:
class FieldLine(SetMixin, list):
separator = ":"
def __init__(self, line):
n = line.split(self.separator)
super(FieldLine, self).__init__(n)
def __str__(self):
return self.separator.join(self)
class PasswdLine(FieldLine):
fields = gen_fields('name', 'passwd',
'uid', 'gid', 'gecos',
'dir', 'shell')
(Where gen_fields is basically the dict()-ized version of
enum_args from here.)
Now that I've written these entries, I have a confession: this is
actually what I started out doing. I didn't first build a general
ordered list with named fields class and then realized it could be used
to deal with /etc/passwd lines; I started out needing to deal with
/etc/passwd lines, decided that I wanted read/write access to named
fields, and then built downwards. I just wrote it up backwards because
it looks neater that way.
(In fact this is the cleaned up and idealized version of this class. The
real one in my program does not subclass list; instead it is a normal
class with a private 'field_store' list and everything just directly
manipulates that. It also doesn't handle the slicing cases, because I
didn't need to. I did the new version for here for various reasons,
including that it was a good excuse to play around with subclassing
built in types.)
Weekly spam summary on March 3rd, 2007
This machine had a planned twelve hour power outage today, so many of these statistics are really only for six days. Having said that, this week we:
- got 16,376 messages from 272 different IP addresses.
- handled 20,396 sessions from 1,270 different IP addresses.
- received at least 212,857 connections from at least 63943 different IP addresses.
- hit a highwater of 5 connections being checked at once.
This is down from last week, but not hugely so; we might have been in the same ballpark if not for the downtime.
| Day | Connections | different IPs |
| Sunday | 35,230 | +12,485 |
| Monday | 32,638 | +10,273 |
| Tuesday | 38,623 | +11,238 |
| Wednesday | 34,186 | +10,274 |
| Thursday | 36,476 | +10,272 |
| Friday | 31,556 | +9,401 |
This is reasonably similar to last week's, although smoother.
Kernel level packet filtering top ten (up to 02:26 am on March 3rd):
Host/Mask Packets Bytes 205.152.59.0/24 10914 495K 206.223.168.238 9216 505K 213.4.149.12 6660 346K 69.25.186.66 5673 272K 81.115.40.8 5360 286K 213.29.7.0/24 4317 259K 68.22.111.226 4051 189K 65.14.221.82 3569 171K 204.202.15.102 3019 149K 211.94.0.0/15 2919 175K
This is down significantly from last week, and it seems unlikely that one more day would have made a major difference.
- 205.152.59.0/24 is Bellsouth, still hammering on us with advance fee fraud spammers through their webmail system. (Well, probably. Since we're not accepting their packets I can't be sure.)
- 206.223.168.238 and 204.202.15.102 return from last week.
- 213.4.149.12 returns from recently and is resuming its usual presence in the listing.
- 69.25.186.66 is mail.mydiscountoffer.com, and was blocked for being in AccelerateBiz network space; after too many spammers, we no longer accept connections from their IP ranges.
- 81.115.40.8 is a telecomitalia.it IP address, last seen in January.
- 68.22.111.226 and 65.14.221.82 kept trying with bad
HELOs.
Connection time rejection stats:
66671 total
40920 dynamic IP
16846 bad or no reverse DNS
5790 class bl-cbl
1512 class bl-sbl
462 acceleratebiz.com
225 class bl-pbl
109 class bl-njabl
104 class bl-dsbl
79 cuttingedgemedia.com
64 class bl-sdul
This is pretty close to last week, and might even have been over it if not for the 12 hour downtime. I'd do a breakdown of the SBL rejections, but there's no real point; 1440 of them come from SBL50892, which is a colocentral.com spammer hosting escalation listing from Feburary 6th, and the next highest one is 12 rejections. (The colocentral.com rejections were spread over 248 different IP addresses, with none of them having more than 9 rejections. The hostnames suggest that we didn't miss anything.)
Three of the top 30 most rejected IP addresses were rejected 100
times or more this week: 69.25.186.66 (181 times), 67.102.251.238
(176 times, a Covad something or other), and 210.176.52.139 (149
times, no reverse DNS). Twelve of the top 30 are currently in the
CBL, ten are currently in bl.spamcop.net, eight are in the PBL, and a grand total of 15 of the 30 are in
zen.spamhaus.org.
This week Hotmail did:
- 4 messages accepted, at least two of them legitimate and one almost certainly spam.
- no messages rejected because they came from non-Hotmail email addresses.
- 41 messages sent to our spamtraps.
- 30 messages refused because their sender addresses had already hit our spamtraps.
- 8 messages refused due to their origin IP address (4 in the CBL, 3 from the Cote d'Ivoire, and one in SBL45516).
And the final numbers:
| what | # this week | (distinct IPs) | # last week | (distinct IPs) |
Bad HELOs |
953 | 95 | 877 | 101 |
| Bad bounces | 17 | 16 | 16 | 12 |
There was no particularly flagrant source of bad HELOs this week,
just the usual crowd with middle double digit rejections before we
dumped them in the kernel filters. Bad bounces once again came from
all over, although possibly with more North American sources than
anywhere else.
Bad bounces were sent to 15 different usernames this week, once again mostly to real ex-users and plausible usernames (and one valid ex-user with some numbers glued on the front). The most popular target, with three bounces, was an ex-user.