Turning over a rock on some weird HTTP requests to our web server

February 29, 2016

I recently made the mistake of looking at our Apache access.log, and in fact watching it live with 'tail -f'. Me being me, I can't just let what I saw sit quietly, so now I'm here to tell you about the big weirdness I saw. Put simply, it was a whole rapid burst of requests that looked like:

IP - - [28/Feb/2016:17:18:38 -0500] "GET /mmievslc.txt HTTP/1.1" 404 [...]
IP - - [28/Feb/2016:17:18:39 -0500] "GET /mmievslc.txt HTTP/1.1" 404 [...]
IP - - [28/Feb/2016:17:18:39 -0500] "GET /mmievslc.txt HTTP/1.1" 404 [...]

When I started digging, I saw multiple IPs making requests like this for multiple different 8-character .txt URLs in the root of our web server (none of which have ever existed). On random spot checks, they almost all happen in bursts (although there can be pauses), and there are a lot of them.

How many? Yesterday, we saw 34,500 such requests (about 10% of the total HTTP requests), from 116 different IPs and for 122 different names. The top three IPs all made over 1000 requests each; the median made 233 requests. Every such request had the same user-agent:

"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

(Only 50 other requests from 13 different IPs used this user-agent.)

On a random spot check of IP addresses doing this, I can't find any that aren't in China. Some but not many of the IP addresses are listed in things like the SBL; others claim to be entirely clean in my blocklist checks.

I spot-checked the IPs doing this yesterday against the IPs doing this today and about two thirds of them are different; checking yesterday against the day before yielded the same result. So there seems to be a different set of sources doing this over time.

We have multiple virtual hosts on this web server, and only two of them are affected; the main departmental web server name and another one (which saw far less volume of these requests). There's nothing obvious that's different between unaffected hosts and affected ones.

And what makes this really mysterious is I have no idea what these requests are supposed to accomplish. Are they an attack of some sort? Are they an accidental side effect of other software? Are they being done deliberately in order to create some sort of useful side effect? Are they traffic cloaking or obfuscation of some sort? Who knows. I may have turned over this rock, but I have no idea how to understand what's scuttling around underneath it.


Comments on this page:

By Ewen McNeill at 2016-03-01 00:23:34:

That sounds suspiciously like the "fast flux" DNS names algorithms, but applied to the filename part. In which case it would potentially be related to botnet control in some fashion, with lots of "maybe these might be valid URLs to fetch at some point" possibilities -- most of which won't, in fact, be valid. (Thanks to the externalities that are the Internet, all the "not useful" requests only cost someone else resources, so the botnet controllers do not care about the wasted effort. But do value the ability to register/upload something in the pattern at will and have it picked up.)

Possibly the "requests from same IP" actually represent NAT? And so there's only a few from each actual (internal) origin? (There's a lot of NAT in China AFAIK; large population and being late to the "can we have some IPv4" game will do that.)

Ewen

What they have accomplished is to make the abroad network speed in China is terribly slow :-(

Written on 29 February 2016.
« Sometimes, doing a bunch of programming can be the right answer
Some notes on OpenSSH's optional hostname canonicalization »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Feb 29 23:02:44 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.