2005-08-01
Multilevel list comprehensions in Python
Python has recently (at least for some values of recently) grown
'list comprehensions',
which let you easily iterate over a list to transform or select
entries (or both). List comprehensions can be thought of as syntactic
sugar for map
and filter
operations, but they're actually more
powerful.
One reason is that you can write a multilevel list comprehension, which effectively iterates over multiple levels of lists. Take the case where you have a list within a list and want to return all of the low-level elements as a list:
l = [] for rr in qa: for s in rr.strings: l.append(s)
This can be rewritten as a two-level list comprehension:
l = [s for rr in qa for s in rr.strings]
This can't easily be done via map
. (We would probably have to roll
in a reduce
to flatten the list of lists that map
would give us
into a single-level list.)
Multilevel list comprehensions work left to right; the leftmost 'for X in Y' is the outermost one, and then we step inwards as we move right. You can also use if conditions, so the correct version of the list comprehension I wrote, in context and with error checking, would be:
from dns.rdatatype import TXT l = [s for rr in qa if rr.rdtype == TXT \ for s in rr.strings]
What impresses me about Python is that this works just the way I thought it would work and both of these examples worked the first time, just as I wrote them, and needed no debugging. (The first version actually got used in a scratch program.)
Spam breakdown by SBL listing, July 31st 2005
This is roughly speaking a table showing the top N SBL listings that are spamming us over the past 28 and change days. I generated it by grabbing all rejected IP addresses, looking them up in the SBL, and counting how many hits each SBL listing accumulated.
Refused connections | SBL listing |
232 | SBL26860 |
208 | SBL23039 (Rokso: Randy Forman) |
190 | SBL21425 (listed since 28 Nov 2004) |
171 | SBL26524 (Rokso: Eric Reinertsen) |
119 | SBL24986 (also Eric Reinertsen) |
97 | SBL24651 |
74 | SBL27934 (a promiscuous webmail machine) |
69 | SBL28992 (more webmail machines) |
66 | SBL23012
(sms.ac , spamming 'invites' madly) |
52 | SBL20280 (a Korean /17 listed since 24 Dec 2004) |
47 | SBL23445 (Rokso: Traffix, listed since Feb 1st) |
44 | SBL29615 (look, more webmail!) |
43 | SBL19307 (a Chinese /16 listed since 9 Nov 2004) |
41 | SBL28644 |
41 | SBL28297 |
40 | SBL15575 (a /18 listing for wanadoo.es) |
35 | SBL28889 (a Chinese /16) |
35 | SBL20719 (a Taiwanese /16) |
33 | SBL24218 (Rokso: Jeffrey Peters) |
33 | SBL23427 (Rokso: Jumpstart Technology LLC, listed since Feb 3rd) |
There were 2522 rejected IP addresses that are now/still in the SBL in total, out of about 35000 that we rejected overall over the time period, so about 7% of the IP addresses we rejected are in the SBL. (Perhaps I will next do these numbers for the CBL.)
This isn't a perfect picture of what the SBL would have done to each of these IP addresses. There are several sources of inaccuracies:
- SBL listings get removed, so some IPs we rejected as SBL-listed when they tried are not SBL-listed now and so are not getting counted.
- not all of these IP addresses got rejected for being SBL-listed, since we check DNS blocklists after other criteria.
- some IP addresses we rejected back then for other reasons may now be SBL-listed.
(Also, by the time you read this blog entry some or many of these SBL listings may have been removed. That's one reason why I date these things.)
Interestingly (and depressingly) the leading SBL listings are located in the US, Canada, and Britain, not in the howling spam-infested wilds of China, Russia, and the like. You have to go all the way down to SBL27934, the first webmail machine, before you find something in another country. China and Korea themselves are surprisingly far down the list (perhaps because they are mostly used for website hosting and less for outright spam-sending).
Most of the listings are quite recent, from April/May and later of 2005. (I believe I have annotated all of the ones that are older than that.)
Over the same time period, 9 IP addresses that were in the SBL when we rejected them got unlisted. Since the SBL doesn't keep old listings, there's no way to tell what they were listed for, or why they got delisted; since they did get delisted, I will avoid naming them here.