2014-05-31
Wnen trying to unsubscribe from spam can be not completely crazy
For reasons beyond the scope of this entry, I've started running a
little 'sinkhole' SMTP server to collect email for what is essentially
a disused set of old addresses of mine. The server takes in everything,
logs the messages to disk, and then tells the senders that they
weren't delivered; later, I can go through the logged messages to
see if anything interesting has shown up. In the process of doing
this I've noticed that a surprising amount of the spam comes with
List-Unsubscribe
headers with URLs. So I've been using some of
them to see what happens (especially when I recognize the sender as
a long term repeat offender).
Normally it's an article of faith that you should never, ever use a spammer's unsubscribe procedure. Doing so only confirms that your address is live and perhaps helps the spammer, reducing the load on their systems and so on. I'm not sure that what I'm doing here is exactly sensible (although it makes for an interesting experiment), but I don't think it's completely crazy.
The big difference between my situation and the normal situation is that the addresses I am 'unsubscribing' are already dead addresses as far as I'm concerned. They basically no longer get valid email (and some of them don't even exist) so any increase in the amount of spam coming to them is immaterial; it's just more grist for the sinkhole to reject. Also, these addresses have been aggressively defended for what is now years and spammers have been trying to bang away at them for years. It's clear that the repeat spammers I can recognize simply don't notice or don't care about any extra load on their mail sending systems that I've been creating with prior tactics like blackholing SMTP traffic from their IPs in the firewall. If they did they would have removed such unresponsive addresses by now, often years ago.
At one level I don't care about the load on my sinkhole, but at another level I do. I'm running the sinkhole to collect interesting new things, and yet more spam from some usual suspect is completely uninteresting. If I can make it go away by unsubscribing rather than making the sinkhole's environment more complicated, so much the better for my real goals.
Or at least that's my thinking so far. I may change my mind later and stop doing this.
(But I'd also find it very interesting if unsubscribing actually seemed to increase the spam from the usual suspects, or even increased it. Seeing if this will happen is one reason I'm bothering with the experiment at all.)
One of my test based development imperfections: use driven testing
Recently I've been coding what I call a 'sinkhole SMTP server' in Go (a SMTP server that doesn't do anything messages except perhaps save them to disk). Over the process of this I've once again gotten to watch one of my (bad) habits with test driven development in action, something that I will call 'use driven testing'.
A SMTP server can conceptually be divided into at least two layers. One layer is a general layer that handles the SMTP protocol; it reads SMTP commands from the network, makes sure they are actually commands, enforces command ordering requirements, handles a bunch of fiddly stuff, and so on. A second layer sits atop this and handles higher level policy issues like what email addresses are acceptable (as source or destination addresses) and what to do with received email messages. The bottom layer is generic (you could in theory reuse it in any number of different SMTP servers) while the top layer is generally specific to your needs.
I started out writing the bottom layer as a Go package. Go has reasonably good support for testing, so I wrote tests for this layer's functionality of parsing SMTP command lines and handling the basic flow of the protocol. In other words, I did relatively conventional test focused development; I wrote code and then wrote tests to make sure it work, and sometimes I mutated the code some to make it easier to test. But at a certain point the general base SMTP layer passed the point where I could start writing the top layer. At this point I switched to writing the top layer and mutating the base layer as necessary to make a better API or to make things work. I didn't write any new base layer tests for the base layer's added functionality and I didn't write tests for the top layer; testing for the top layer consisted of actually using it. This is the switch to what I'm calling 'use driven testing', where actually using my code is how I test it.
This is flawed and imperfect but it's hard for me to see how to get out of it. Writing top layer code, changing the bottom layer to match, and then going on to write bottom layer tests seems like make work or wasted work. I have to have the top layer code and the bottom layer tests basically duplicate that work without telling me much extra. Of course this is wrong; writing tests will tell me not just if something works now but whether it keeps on working. But it's hard to feel motivated to do the extra work and it's also hard to shape an API for both of convenient testing and the convenience of higher layer stuff.
(There's also the related question of how much stuff in the higher
layer I want to test and what the best way to test it is. I think
that Go will let me write tests for code in the main
package that
makes up your program, but I haven't actually verified that.)
Okay, let me admit something else about this: writing live code is a lot more fun than writing tests. When I write top layer code, my program does something real for me. When I write more tests, yay, more tests (which may break and have to be redone if I restructure what my actual productive code does). It's very hard to avoid the fun and do drudgery, especially when I'm doing this entirely for fun. At the same time I wind up feeling guilty for having minimal tests and chunks of code that are only tested through use by the higher level.
Complicating this is that some of the functionality I wound up putting in the lower layer is not straightforward to test. For example, how do I test that TLS negotiation actually works or that network IO is (optionally) being written at an artificially slow rate of one character every tenth of a second? There are probably clever ways but they're not obvious to me, and it's hard to feel hugely motivated when I can test these using the live program by inspection or by using swaks.
(I have considered the merits of automatedly hooking the Go SMTP client up to my server and verifying that it, for example, sees the expected SMTP extensions. Maybe this is actually the right answer.)
I don't have any answers here, just stuff that I'm thinking about aloud. Although perhaps my use driven testing is not completely crazy and at some point I should just accept that high level tests of functionality are fine (even if some of them are manual).
PS: part of the pain here is that testing the output of a SMTP server is kind of a pain in the rear. It's easy enough to test the literal output in response to a series of commands but that's both verbose and it blows up any time you change the messages the server sends (which discourages changing those messages, which to me is bad). Doing better requires building some moderately complex testing infrastructure to extract, say, the sequence of SMTP response codes that you expect so you don't care about the literal text.