Wandering Thoughts archives

2013-05-19

Today's comment spammer trick: regurgitated comments

I log the contents of some attempted spam comments here on Wandering Thoughts (the concise summary of when is when the spammer seems to be trying hard). Usually this doesn't get anything, but today my trawl through the logs turned up a succession of bizarre and odd comment attempts. The text had misspellings and typos but it generally made sense and most of the comment attempts were even about technical things that are vaguely on topic for here. But they were invariably attempts to comment on very inapplicable entries.

When I looked at the logs in detail, one of the most striking was a series of comment attempts that looked very much like a conversation between two or more people about using git on home directories. This was very odd since none of the comments were being posted, yet the people were pretty clearly replying to each other; I began to develop all sorts of theories about disturbingly intelligent content auto-generation. Finally I noticed something in one of the comment texts and the penny dropped:

[...] Possibly related posts: (automatically generated)Heroku, the Rails app.

There is a really simple way to get this text into a spam comment: you can be scraping content from existing blog posts and/or blog comments. So my new theory is that the would-be comment spammer is is scraping comment text from other blogs, mangling them somewhat, and then spam-posting them on other blogs (including mine).

The mangled text doesn't seem to have any links or other spam-relevant text so I'm not sure why the spammers are doing this. Maybe they're fishing to see what blogs will allow their comments through moderation and will follow up with more active content on blogs where this works.

Sidebar: source details and other things

So far 30 different IP addresses have tried this here today; most IP addresses have made only one attempt each. The IP addresses cover a large range of source networks. A few of them are CBL listed but that's pretty much it as far as DNBLs are concerned. Four of the IP addresses actually belong to Microsoft (168.63.43.185, 168.63.62.182, 168.63.76.184, and 168.63.84.217; all four are currently listed on the CBL). I'm assuming that these are compromised machines, VPS servers, or both.

Many of the IP addresses also made a burst of GET requests for various other URLs here. Maybe they're scraping text from Wandering Thoughts for use in their corpus for their next spam run somewhere else.

spam/RegurgitatedCommentSpam written at 22:45:29; Add Comment

The technical effects of being an out of tree Linux kernel module

Suppose that you have a kernel module that is not in the mainstream kernel source for one reason or another. Perhaps it is license compatible but just not integrated for various reasons (as is the case with IET) or perhaps it is license incompatible (as is the case with ZFS on Linux). This non-inclusion has a number of cultural effects, but it also has real technical effects. Although I've mentioned them before, today I want to talk about them in some detail.

The first thing to know is that the Linux kernel does not have a stable kernel API for modules; how a module interacts with the rest of the kernel can and will change without notice. When your module is part of the kernel source, changing it to cope with the API change is generally the responsibility of the kernel developer who wants to make the API change. When your module is not in the kernel tree, not only is changing its code your job but so is even knowing about the API change. And API changes are not always obvious because sometimes they're things like changes in locking requirements or how you are supposed to use existing functions.

(Sometimes they are semi-obvious, like changing just what arguments a function takes. You do pay attention to all warning messages that show up when building your kernel module, right?)

Any number of people would like this to change but it isn't going to. The Linux kernel development process is optimized for in-tree code and not for out of tree code. If your out of tree code cannot be included in the kernel for various reasons, that's tough luck but the kernel developers really don't care that much (as a general rule). Locking themselves down to any stable module API would reduce their ability to improve and evolve the kernel code.

The next effect is pragmatic: if your code is not in the kernel tree, almost no one will look at it (and this includes automated scans over the kernel source code that look for various things) or do things to it. This is great if you're possessive about your code but it means that you're missing out on the quality checking that this creates, all of the little janitorial cleanups that people do, and if there is a bug then your module's developers are the only people who are looking at it.

(In some quarters it's fashionable to think that the Linux kernel developers are all clowns and cannot possibly contribute anything worthwhile to your code. This is a major mistake. Among other things they're basically certain to know the overall Linux kernel environment better than you do.)

A related issue is that the kernel developers try not to create bugs and regressions in in-tree code, especially if it's considered important (which, say, a commonly used filesystem will be); if one is created anyways a bunch of people will go looking to try to fix it. It's almost certain that no official kernel release would go out that broke a significant filesystem; the change that created the breakage would be identified and then reverted, with the change's developer told to try again. If your module is not in the tree, well, you're on your own. Performance regressions or actual breakages are your problem to diagnose and then either fix or try to argue the kernel developers into changing their side of the problem.

(And they may not, especially if your code is license-incompatible with the kernel and most especially if their change actually improves in-tree code and performance and so on.)

All of this means an out of tree kernel module requires more ongoing development work than an in-tree kernel module. In-tree kernel modules generally get somewhat of a ride from general kernel developers; out of tree modules do not and have to make up for it with time from their own developers. One predictable result is that many out of tree modules don't necessarily support all kernel versions, including kernel versions that sysadmins may want to use. A worst case situation with out of tree modules is that the developers simply stop updating the module for new kernels; any users of the module are then orphaned on old kernels.

linux/TechnicalNonGPLKernelModules written at 01:19:52; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.