Wandering Thoughts archives


The mystery of POSTs with a zero Content-Length

One of the joys of running web software that is rather paranoid is getting to see all sorts of weird things that float around the web, generally run by spammers and other people who are up to no good. Today's oddity could be called 'the case of the zero-length POST' and is just what it sounds like: POST requests that have a Content-Length of 0 bytes.

(Or at least they have a Content-Length of '0' after Apache gets through passing the request to DWiki. It's possible that Apache is silently sanitizing some bizarre C-L header value to 0.)

I don't know the full headers for these POST requests but because of the code flow inside DWiki, I know that they claim to be form submissions (I check Content-Type before looking at Content-Length). It's possible that the requests have some other header that is supposed to preempt Content-Length. It's also possible that this software is submitting empty POST form requests to see what happens or because this evades security precautions in some applications.

Based purely on the claimed User-Agent values I can say that this software is up to no good, since some of the time it claims to be Googlebot (or at least some of the requests claim to be from Googlebot, since I suppose I shouldn't assume that there's just one piece of software that's doing this). All of the requests I've pulled out of the logs seem to be HTTP/1.1 requests and generally are for regular URLs. The software involved also seems to almost always lower-case the URLs it's using, which very much doesn't work very well here.

(Looking at User-Agent suggests there may be two different programs involved, one of which claims to be Googlebot and one of which doesn't sent a User-Agent at all. Only the Googlebot-faker seems to lowercase its URLs; the other program mostly POSTs to Wandering Thought's main page but occasionally POSTs to other, correctly-cased URLs. The second program seems to be the more active one.)

I don't have any answers to this particular mystery and in fact now that I've looked into it it's more mysterious than before. Sometimes that's how it goes on the web these days.

Sidebar: volume and source details

These requests aren't happening in high volume but are generally happening several times a day from various different IPs. In the last ten days there have been at least 150 instances from 50 different IPs; the most prolific five IPs made 22, 13, 11, 8, and 6 requests each respectively. I haven't tried to run down the origin of all of the IPs, but China shows up a lot in the top-N list. One IP is currently in the SBL, in SBL181621 (a /24 listing due to blackhat SEO spammer hosting).

web/ZeroLengthPOSTs written at 22:47:37; Add Comment

Understanding the MongoDB code that people are laughing at

Recently my corner of the Twitter-verse has been moderately aflame with a number of people pointing and laughing at a chunk of MongoDB source code; this example is typical. At first I automatically accepted that the code was laughable (and you have to admit that seeing a call to random() in database client code is odd) but when the link kept cropping up I got curious enough to start reading the surrounding code. Unfortunately the more I read the more sane the code looked. Since the details don't exactly fit in a tweet, it's time to write a more substantial explanation of the code and why it's sane.

Let's start with the code that people are directly laughing about (or so I assume since this is the line that people have been linking to):

if (!((_ok) ? true : (Math.random() > 0.1))) {
   return res;

In the interests of cleverness, someone has made this if condition much too hard to read. Rewritten to be clear, it is '!_ok && Math.random() <= 0.1'. If _ok is false, this if and its immediate return will fire about 10% of the time. If _ok is true, it will never fire.

Now let's look at the surrounding code, or at least bits of it. The context goes like this:

abstract class ConnectionStatus {
  static abstract class UpdatableNode {
    public CommandResult update() {
      CommandResult res = null;
      try {
        ... try to talk to a server ...

        if (!_ok) {
          getLogger().log(Level.INFO, "Server seen up: " + _addr);
        _ok = true
      } catch(Exception e) {
        if (!((_ok) ? true : (Math.random() > 0.1))) {
          return res;

        ... log a message about the
            the server being down ...
        _ok = false;

      return res;

Let me translate this: the effect of the if with the Math.random() is that if there's an error and the server has already been marked as down, about 10% of the time the code doesn't bother to log a message that it's still down (and doesn't record the actual exception caught as part of that log message). If the server is marked up and there's an error, a message is always logged. In all cases res is returned, possibly as null, and I believe no exception is propagated; I assume that this is an intended part of the API.

(There may actually be a bug here; it may be intended that the message is skipped 90% of the time when the server is already marked as down, not 10%.)

This code does not strike me as laughable, crazy, or worthy of mockery. It may have some defects and you may disagree with what it's doing, but it sure seems to be sane code in general. You probably have worse things in some of your projects (I know that I do in mine). My personal view is that what the code is doing is perfectly sensible to avoid flooding logs, but since I don't know MongoDB I don't know how much log volume this would produce (or avoid) and how important that log information would be during problem diagnosis.

(Note that the code will always log 'server seen down' and 'server seen up' when it first notices the server down and when it first notices the server back up. The only question is how many 'server still down' messages it logs or should log between those two, in a range somewhere between 'none' and 'one for every attempt to talk to the server'.)

If there is a fault in this code it is not the use of Math.random() in database client code, it is that the if condition has been written in such an overly clever way. And if people want to point and laugh at overly clever code, well, there's a lot of fish in a lot of barrels (our own generally included).

Update: C J Silverio pointed me to the initial commit for this code. The initial commit is much simpler and makes it clear that the current 'log 90%' state is a bug.

programming/UnderstandingMongoDBRandom written at 00:02:32; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.