Some things I mean when I talk about 'forged HTTP referers'

March 7, 2018

One of the most reliable and often the fastest ways to get me to block people from Wandering Thoughts is to do something that causes my logs to become noisy or useless. One of those things is persistently making requests with inaccurate Referer headers, because I look at my Referer logs on a regular basis. When I talk about this, I'll often use the term 'forged' here, as in 'forged referers' or 'referer-forging web spider'.

(I've been grumpy about this for a long time.)

I have casually used the term 'inaccurate' up there, as well as the strong term 'forged'. But given that the Referer header is informational, explicitly comes with no guarantees, and is fully under the control of the client, what does that really mean? As I to use it, I tend have one of three different meanings in mind.

First, let's say what an accurate referer header is: it's when the referer header value is an honest and accurate representation of what happened. Namely, a human being was on the URL in the Referer header and clicked on a link that sent them to my page, or on the site if you only put the site in the Referer. A blank Referer header is always acceptable, as are at least some Referer headers that aren't URLs if they honestly represent what a human did to wind up on my page.

An inaccurate Referer in the broad sense is any Referer that isn't accurate. There are at least two ways for it to be inaccurate (even if it is a human action). The lesser inaccuracy is if the source URL contains a link to my page, but it doesn't actually represent how the human wound up on my page, it's just a (random) plausible value. Such referers are inaccurate now but could be accurate in another circumstances. The greater inaccuracy is if the source URL doesn't even link to my page, so it would never be possible for the Referer to be accurate. Completely bogus referers are usually more irritating than semi-bogus referers, although this is partly a taste issue (both are irritating, honestly, but one shows you're at least trying).

(I'd like better terms for these two sorts of referers; 'bogus' and 'plausible' are the best I've come up with so far.)

As noted, I will generally call both of these cases 'forged', not just 'inaccurate'. Due to my view that Referer is a human only header, I use 'forged' for basically all referers that are provided by web spiders and the like. I can imagine circumstances when I'd call Referer headers sent by a robot as merely 'inaccurate', but they'd be pretty far out and I don't think I've ever run into them.

The third case and the strongest sense of 'forged' for me is when the Referer header has clearly been selected because the web spider is up to no good. One form of this is Referer spamming (which seems to have died out these days, thankfully). Another form is when whatever is behind the requests looks like it's deliberately picking Referer values to try to evade any security precautions that might be there. A third form is when your software uses the Referer field to advertise yourself in some way, instead of leaving this to the User-Agent field (which has happened, although I don't think I've seen it recently).

(Checking for appropriate Referer values is a weak security precaution that's easy to bypass and not necessarily a good idea, but like most weak security precautions it does have the virtue of making it pretty clear when people are deliberately trying to get around it.)

PS: Similar things apply when I talk about 'forged' other fields, especially User-Agent. Roughly speaking, I'll definitely call your U-A forged if you aren't human and it misleads about what you are. If you're a real human operating a real browser, I consider it your right to use whatever U-A you want to, including completely misleading ones. Since I'm human and inconsistent, I may still call it 'forged' in casual conversation for convenience.

Written on 07 March 2018.
« The lie in Ubuntu source packages (and probably Debian ones as well)
Some questions I have about DDR4 RAM speed and latency in underclocked memory »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Mar 7 23:30:55 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.