Some things I mean when I talk about 'forged HTTP referers'
One of the most reliable and often the fastest ways to get me to block people from Wandering Thoughts is to do something that causes my logs to become noisy or useless. One of those things is persistently making requests with inaccurate Referer headers, because I look at my Referer logs on a regular basis. When I talk about this, I'll often use the term 'forged' here, as in 'forged referers' or 'referer-forging web spider'.
(I've been grumpy about this for a long time.)
I have casually used the term 'inaccurate' up there, as well as the strong term 'forged'. But given that the Referer header is informational, explicitly comes with no guarantees, and is fully under the control of the client, what does that really mean? As I to use it, I tend have one of three different meanings in mind.
First, let's say what an accurate referer header is: it's when the
referer header value is an honest and accurate representation of what
happened. Namely, a human being was on the
URL in the Referer
header and clicked on a link that sent them to my
page, or on the site if you only put the site in the Referer
. A blank
Referer
header is always acceptable, as are at least some Referer
headers that aren't URLs if they honestly represent what a human did to
wind up on my page.
An inaccurate Referer
in the broad sense is any Referer
that isn't
accurate. There are at least two ways for it to be inaccurate (even
if it is a human action). The lesser inaccuracy is if the source URL
contains a link to my page, but it doesn't actually represent how
the human wound up on my page, it's just a (random) plausible value.
Such referers are inaccurate now but could be accurate in another
circumstances. The greater inaccuracy is if the source URL doesn't even
link to my page, so it would never be possible for the Referer
to be
accurate. Completely bogus referers are usually more irritating than
semi-bogus referers, although this is partly a taste issue (both are
irritating, honestly, but one shows you're at least trying).
(I'd like better terms for these two sorts of referers; 'bogus' and 'plausible' are the best I've come up with so far.)
As noted, I will generally call both of these cases 'forged', not
just 'inaccurate'. Due to my view that Referer
is a human only
header, I use 'forged' for basically all
referers that are provided by web spiders and the like. I can
imagine circumstances when I'd call Referer
headers sent by a
robot as merely 'inaccurate', but they'd be pretty far out and
I don't think I've ever run into them.
The third case and the strongest sense of 'forged' for me is when
the Referer
header has clearly been selected because the web
spider is up to no good. One form of this is Referer
spamming
(which seems to have died out these days, thankfully). Another
form is when whatever is behind the requests looks like it's
deliberately picking Referer
values to try to evade any security
precautions that might be there. A third form is when your software
uses the Referer
field to advertise yourself in some way, instead
of leaving this to the User-Agent
field (which has happened, although I don't think I've seen it
recently).
(Checking for appropriate Referer
values is a weak security
precaution that's easy to bypass and not necessarily a good idea,
but like most weak security precautions it does have the virtue of
making it pretty clear when people are deliberately trying to get
around it.)
PS: Similar things apply when I talk about 'forged' other fields,
especially User-Agent
. Roughly speaking, I'll definitely call
your U-A forged if you aren't human and it misleads about what you
are. If you're a real human operating a real browser, I consider
it your right to use whatever U-A you want to, including completely
misleading ones. Since I'm human and inconsistent, I may still
call it 'forged' in casual conversation for convenience.
|
|