How comment spammers behave
One of the things that watching your logs while trying out various
comment spam precautions is good for
is seeing how comment spammers seem to behave, or at least how the
comment spammers that drop by WanderingThoughts behave. (Your mileage
may vary, since there are a lot of comment spammers out there and they
can't all be using the same tools.)
As before, I'm only really interested in defeating the automated comment
spammers; a dedicated person is always going to be able to leave comments
here. (And I'm not interested in making it so that people writing comments
can't include links.)
So, my observations on comment spammers to date:
- they will hit any POST form with a submit button that they can
see. They don't seem to spam the search box (which is a GET form
without an explicit submit button), but they do regularly try to submit
comment spam through DWiki's login form.
(The most amusing login form spammer is the one that believes in being
honest; they start all of their spam attempts with 'sorry, but i need
money...'.)
- however, they almost never go past the first form submission. The
single greatest reduction in successful comment spam that I ever
managed was changing my comment form so that you had to preview before
actually posting your comment; almost every spammer previewed and then
just went away.
- some but not all of them fill in any form field that they spot; my
comment form's honeypot field gets a regular stream of programs that
trip over it, but there are about as many spammers who don't.
- the basic User-Agent checking I do
is surprisingly effective. It is also a very cheap check to make,
since you can even do it in Apache itself.
- a fair number of them harvest your comment form from one IP and then
submit from another (or a pool of others). This is really easy to see
in the full web logs, and so my 'must submit from the same /24'
precaution trips up a reasonable number of would-be comment spammers.
However, the really interesting thing is that a number of comment
spammers modify this hidden field. All of the spammers that modify
it seem smart enough to try putting in IP addresses, but they make
them up randomly instead of using the IP address they're POSTing the
form from, and they don't notice that the field is not formatted as
a straight IP address. (And sometimes they stick some newlines on the
end.)
They may be doing this partly because I called the field 'previp'.
(My current format for it is the IP address less the last octet,
so the real version looks like 'A.B.C.', with no newline at the end.)
Looking at some numbers, it appears that most comment spammers that
don't trip up on the honeypot field make up random IP addresses to
put in this field instead of leaving it alone.
- comment spammers almost always use comment spam using all four of the
popular syntaxes for making links at once. These seem to be:
- a bare URL:
http://...
- a full HTML link:
<a href="http://...">..</a>
(I have seen one spammer that turned the initial < into <.)
[url]http://...[/url]
[url=http://...]...[/url]
(I'm not sure what uses the last two forms, but they turn up a lot.)
The links don't necessarily all go to the same website, but the presence
of all four forms in the same comment is a pretty good danger sign.
(As the result of a recent aggressive (and temporarily successful)
spam run, WanderingThoughts currently rejects comments that contain any
of the last three forms of links, since they don't work here anyways.)
- while typical comment spam attempts to include more links than normal,
it's not a lot more than normal; for example, the recent aggressive
comment spam only had four links per comment (one in each link format).
I also have some negative results. First, it's not worth checking for
correct Referer values; almost every comment spammer that made it past
my basic User-Agent checks sent the right value.
Also, very soon after I changed my comment form to only have a preview
option at the start I saw a significant jump in comment spam attempts.
From this I formed the hypothesis that comment spammers are unduly
attracted to forms with only one submit button; however, various
experiments I've tried since then suggest that this isn't the case.
(I changed things so the first 'add comment' page had two form
submission buttons and the backend DWiki code just made them do the same
thing. But I didn't see any reduction in comment spam attempts, even
across various variants of how the buttons were named and so on.)