Wandering Thoughts archives

2020-04-12

The appeal of doing exact string comparisons with Apache's RewriteCond

I use Apache's RewriteCond a fair bit under various circumstances, especially here on Wandering Thoughts where I use it in .htaccess to block undesirable things (cf). The default RewriteCond action is to perform a regular expression matches, and generally this is what I want; for instance, many web spiders have user agents that include their version number, and that number changes over time. However, recently I was reminded of the power and utility of doing exact string matches for some circumstances.

Suppose, not hypothetically, that you have some bad web spiders that crawl your site with a constant bogus HTTP Referer of:

http://www.google.co.uk/url?sa=t&source=web&cd=1

Or another web spider might crawl with an unusual and fixed user-agent of:

Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36

I could use regular expressions to match and block these, but that's at least annoying because both of these strings have various special regular expression characters that I'd have to carefully escape. So instead we can use RewriteCond's '=' option to do an exact string comparison. The one slightly tricky bit is that you want to enclose the entire thing in "'s, that is:

RewriteCond %{HTTP_REFERER} "=http://www.google.co.uk/url?sa=t&source=web&cd=1" [NC]

(The '[NC]' is perhaps overkill, especially as the spider probably never varies the case. But it's a reflex.)

As you can see, instances of '=' in the string don't have to be escaped. If the string I wanted to match (exactly) on had quotes in it, I'd have to look up how to escape them in Apache.

Now that I've looked up this RewriteCond option and gotten it working for me, I'm probably going to make more use of it. Various bad web spiders (and other software) has pretty consistent and unique signatures in various headers, which generally beats playing whack-a-mole with their IP address ranges.

(This probably isn't very useful outside of blocking bad people, although I suppose it could be used to rewrite only certain exact URLs while allowing others to fall through, or the reverse.)

ApacheRewriteCondExactMatch written at 22:40:50; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.