The shifting SBL, as experienced here
I still sort of run a mail server which gets a low enough connection volume that I can monitor the logs directly. This MTA rejects connections from SBL listed IPs, at a sufficiently low volume that I almost always look into the actual SBL listing (partly because I may want to apply my own blocks, including IP-level ones).
In the beginning, the volume of SBL hits was low but most of the actual SBL listings were for network ranges (not just single IPs) owned by what I privately characterized as 'the worst of the worst'. These were the people and organizations who spammed so many people so often that they finally convinced the SBL that they were very definitely dirty. Hits were rare partly because there never were really large numbers of these people, partly because I and other DNS blocklists blocked such people before the SBL, and perhaps partly because these people just didn't target me very often.
(I and a fair number of other people felt that the SBL was far too conservative and gave spammers way too many chances, but the SBL had its standards and that was it.)
I'm not sure when things started shifting, but this is not the pattern that I see today. The modern SBL experience is that most SBL hits are from single IPs that are listed as probably compromised or, to a lesser extent, from IPs that are on the SBL CSS. Hits from genuine SBL listed dirty blocks seem to be rare.
Out of curiosity I pulled eight days of records from the department's main mail gateway and looked through them for SBL rejections. Of the 80 IPs that (still) had SBL listings, the SBL CSS accounts for 35, 18.104.22.168/24's SBL136747 listing for four, and a random sampling of everything else shows single (compromised) IPs.
(Yesterday is a bit different. There are 27 IPs that are still SBL listed, with 21 of them on the SBL CSS. But two of the remaining were for bad netblocks and one IP was listed for spammer hosting. The other three were the usual single compromised machine pattern.)
I don't know what this means, if anything; I just find it interesting.
(I can come up with all sorts of potential theories but I will spare you all; they're generally obvious anyways. Just in case there's any doubt, I should note that I'm all for the SBL listing all sorts of spam sources and so I have no objection to the apparent new inclusion of compromised machines that are spewing advance fee fraud and phish spam and so on.)
Why you should support specifying comparison keys (and functions)
I've recently been taking another pass at optimizing DWiki's code a bit. In the process of profiling things, I noticed some code that was doing linear searches through an ordered list (for reasons that made sense at the time). What my code was doing just cried out for a better bisection-based approach, and it looked like I was in luck; not only does Python have a bisect module in the standard library but CPython actually has a C-level implementation for the best efficiency you could ask for.
Unfortunately I can't use it. You see, the people who created this
module hard-coded everything to use direct comparisons so the code is
full of expressions like '
if x < a[mid]: ...', and what I want to
bisect through is not comparable this way. Like many people, I have a
list where each element is itself a tuple and I want to search through
the list based on one field of those tuples.
The bisect module has two failures here. The direct failure is that it
missed a fundamental rule of general Python code: any time you write
a searching or sorting routine, you should support specifying the key's
field. Ideally you'd support a general comparison function as well.
.sort() method on lists is a good example (although Python 3
drops the comparison function).
The indirect failure is that this code has misunderstood what the fundamental data structures of Python are. The bisect module would sort of make sense if the fundamental data structure was an object; then your bisectable lists would be made up of objects and your objects would have comparisons defined on them and everything would be fine. But this is not the case. The fundamental data structures in Python are the primitive types; lists, tuples, and dictionaries. Almost no one bothers to make objects where one of these three will do. Any generic code that deals with ordered comparisons (searching, sorting, inserting in order in some data structure like a B-tree, and so on) should expect to be routinely handed these primitive types and for people want to order things based on a particular field in them; failing to handle this very common situation is a failure in your code.
(Well, this is half the reason to support specifying the key's field. The other half is that it's perfectly reasonable to want to order the same objects in several different ways depending on what you're doing with them at the moment.)
Sidebar: getting around the bisect module with a club
Suppose that you are very irritated with the bisect module's limitation here and are willing to commit evil hacks. Then here is basic code for something that I will call a virtual Schwartzian transformation:
class VirtualList(object): def __init__(self, lst, key): self.lst = lst self.key = key def __len__(self): return len(self.lst) def __getitem__(self, idx): return self.lst[idx][self.key]
Take your list of whatever-it-is you have, make a virtual list with
vlst = VirtualList(lst, KEY)', and then bisect
vlst; the index
position you get back is valid for
lst. KEY does not have to be a
number; it works perfectly fine to have a list of dictionaries (or more
complex objects). This only really makes sense for bisection, binary
searching, and so on, where you can have a large list but should only
ever look at a few elements from it.
You can extend this approach to support custom comparison functions by using a dynamic class and materializing instances on demand, but I'll leave that as an exercise for the sufficiently interested and perverse reader.