Turning over a rock on some weird HTTP requests to our web server
I recently made the mistake of looking at our Apache access.log, and in
fact watching it live with '
tail -f'. Me being me, I can't just
let what I saw sit quietly, so now I'm here to tell you about the
big weirdness I saw. Put simply, it was a whole rapid burst of
requests that looked like:
IP - - [28/Feb/2016:17:18:38 -0500] "GET /mmievslc.txt HTTP/1.1" 404 [...] IP - - [28/Feb/2016:17:18:39 -0500] "GET /mmievslc.txt HTTP/1.1" 404 [...] IP - - [28/Feb/2016:17:18:39 -0500] "GET /mmievslc.txt HTTP/1.1" 404 [...]
When I started digging, I saw multiple IPs making requests like this for
multiple different 8-character
.txt URLs in the root of our web server
(none of which have ever existed). On random spot checks, they almost all
happen in bursts (although there can be pauses), and there are a lot of
How many? Yesterday, we saw 34,500 such requests (about 10% of the total HTTP requests), from 116 different IPs and for 122 different names. The top three IPs all made over 1000 requests each; the median made 233 requests. Every such request had the same user-agent:
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
(Only 50 other requests from 13 different IPs used this user-agent.)
On a random spot check of IP addresses doing this, I can't find any that aren't in China. Some but not many of the IP addresses are listed in things like the SBL; others claim to be entirely clean in my blocklist checks.
I spot-checked the IPs doing this yesterday against the IPs doing this today and about two thirds of them are different; checking yesterday against the day before yielded the same result. So there seems to be a different set of sources doing this over time.
We have multiple virtual hosts on this web server, and only two of them are affected; the main departmental web server name and another one (which saw far less volume of these requests). There's nothing obvious that's different between unaffected hosts and affected ones.
And what makes this really mysterious is I have no idea what these requests are supposed to accomplish. Are they an attack of some sort? Are they an accidental side effect of other software? Are they being done deliberately in order to create some sort of useful side effect? Are they traffic cloaking or obfuscation of some sort? Who knows. I may have turned over this rock, but I have no idea how to understand what's scuttling around underneath it.
Sometimes, doing a bunch of programming can be the right answer
I like doing programming, and on top of that I can be a bit obsessive about it; for instance, if there are obvious features for a program to have, I want to add them even if they may not be strictly necessary. If left to myself, I would write plenty of programs for plenty of things and enjoy it a fair bit. The problem with this is that locally written programs are often an overhead and a long term burden, as xkcd has famously pointed out. Sure, it's nice to write code to solve our problems, but often that's not the right answer. I'm very conscious of this every time I'm tempted to write a program, and as a result I wind up sitting on my hands a lot.
We have a long standing local program to sort of deal with the pain of the Ubuntu package update process. It was a relatively minimal program, but it worked, and so for a long time I suppressed my urge to make it shinier and let it be. A couple of weeks ago I reached the limits of my tolerance after one too many extended end-of-day update runs. Never mind being sensible, I was going to change things because I couldn't take the current situation any more, and it didn't matter if this was objectively a bad use of my time.
I spent about a week working over most of the code, substantially growing the program in the process. The result is faster and more convenient, but it is also a lot more than that. The old update process had a lot of limitations; for example, it didn't really notice if updating one machine had problems, and if updating one machine hung there was no way to go see what was going on and maybe rescue the situation. The new program fixes these issues. This makes it substantially more complicated, but also much more useful (and less dangerous). There are a whole host of things we can do now because I got annoyed enough at the state of affairs to do something that wasn't strictly sensible (and then carry on further).
There's two lessons I draw from this. The first is that sometimes writing the code is the right answer after all. To put it one way, not everything that feels good is a bad idea. The second is that I actually should have done this years ago. This problem and its parade of irritations and workarounds is not new; we've been annoyed at the hassles of Ubuntu updates probably for as long as we've been running Ubuntu machines, and there's nothing in my code that couldn't have been done years ago. Had I done this coding much earlier, well, we could have been enjoying its improvements for quite some time by now.
(The meta-lesson here is that the earlier you make a change or an improvement with a likely long lifetime, the higher your payoff is. From the start we were pretty certain we'd be running Ubuntu machines for a long time to come, so clearly we could have forecast that a good update handling program had a big potential long-term payoff.)