2018-03-07
Some things I mean when I talk about 'forged HTTP referers'
One of the most reliable and often the fastest ways to get me to block people from Wandering Thoughts is to do something that causes my logs to become noisy or useless. One of those things is persistently making requests with inaccurate Referer headers, because I look at my Referer logs on a regular basis. When I talk about this, I'll often use the term 'forged' here, as in 'forged referers' or 'referer-forging web spider'.
(I've been grumpy about this for a long time.)
I have casually used the term 'inaccurate' up there, as well as the strong term 'forged'. But given that the Referer header is informational, explicitly comes with no guarantees, and is fully under the control of the client, what does that really mean? As I to use it, I tend have one of three different meanings in mind.
First, let's say what an accurate referer header is: it's when the
referer header value is an honest and accurate representation of what
happened. Namely, a human being was on the
URL in the Referer
header and clicked on a link that sent them to my
page, or on the site if you only put the site in the Referer
. A blank
Referer
header is always acceptable, as are at least some Referer
headers that aren't URLs if they honestly represent what a human did to
wind up on my page.
An inaccurate Referer
in the broad sense is any Referer
that isn't
accurate. There are at least two ways for it to be inaccurate (even
if it is a human action). The lesser inaccuracy is if the source URL
contains a link to my page, but it doesn't actually represent how
the human wound up on my page, it's just a (random) plausible value.
Such referers are inaccurate now but could be accurate in another
circumstances. The greater inaccuracy is if the source URL doesn't even
link to my page, so it would never be possible for the Referer
to be
accurate. Completely bogus referers are usually more irritating than
semi-bogus referers, although this is partly a taste issue (both are
irritating, honestly, but one shows you're at least trying).
(I'd like better terms for these two sorts of referers; 'bogus' and 'plausible' are the best I've come up with so far.)
As noted, I will generally call both of these cases 'forged', not
just 'inaccurate'. Due to my view that Referer
is a human only
header, I use 'forged' for basically all
referers that are provided by web spiders and the like. I can
imagine circumstances when I'd call Referer
headers sent by a
robot as merely 'inaccurate', but they'd be pretty far out and
I don't think I've ever run into them.
The third case and the strongest sense of 'forged' for me is when
the Referer
header has clearly been selected because the web
spider is up to no good. One form of this is Referer
spamming
(which seems to have died out these days, thankfully). Another
form is when whatever is behind the requests looks like it's
deliberately picking Referer
values to try to evade any security
precautions that might be there. A third form is when your software
uses the Referer
field to advertise yourself in some way, instead
of leaving this to the User-Agent
field (which has happened, although I don't think I've seen it
recently).
(Checking for appropriate Referer
values is a weak security
precaution that's easy to bypass and not necessarily a good idea,
but like most weak security precautions it does have the virtue of
making it pretty clear when people are deliberately trying to get
around it.)
PS: Similar things apply when I talk about 'forged' other fields,
especially User-Agent
. Roughly speaking, I'll definitely call
your U-A forged if you aren't human and it misleads about what you
are. If you're a real human operating a real browser, I consider
it your right to use whatever U-A you want to, including completely
misleading ones. Since I'm human and inconsistent, I may still
call it 'forged' in casual conversation for convenience.
The lie in Ubuntu source packages (and probably Debian ones as well)
One of the things that pisses me off about the Debian and Ubuntu source package format is that people clearly do not actually use it to build packages; they use other tools. You can tell because of how things are broken.
(I may have been hasty in tarring Debian with this particular brush but it definitely applies to Ubuntu.)
Several years ago I wrote about one problem with how Debian builds from source packages, which is that it doesn't have a distinction between the package's source tree and the tree that the package is built in and as a result building the package can contaminate the source tree. This is not just a theoretical concern; it's happened to us. In fact it's now happened with both the Ubuntu 14.04 version of the package and then the Ubuntu 16.04 version, which was contaminated in a different way this time.
This problem is not difficult to find or notice. All you have to
do is run debuild
twice in the package's source tree and the
second one will error out. People who are developing and testing
package changes should be doing this all the time, as they build
and test scratch versions of their package to make sure that it
actually has what they want, passes package lint checks, and so on.
Ubuntu didn't find this issue, or if they found it they didn't care
enough to fix it. The conclusion is inescapable; the source package and
all of the documentation that tells you to use debuild
on it is a
lie. The nominal source package may contain the source code that went
into the binary package (although I'm not sure you can be sure of that),
but it's not necessarily an honest representation of how the package is
actually built by the people who work on it and as a result building the
package with debuild
may or may not reproduce the binary package you
got from Ubuntu. Certainly you can't reliably use the source package to
develop new versions of the binary package; one way or another, you will
have to use some sort of hack workaround.
(RPM based distributions should not feel too smug here, because they have their own package building issues and documentation problems.)
I don't build many Ubuntu packages. That I've stumbled over two packages out of the few that I've tried to rebuild and they're broken in two different ways strongly suggests to me that this is pretty common. I could be unlucky (or lucky), but I think it's more likely that I'm getting a reasonably representative random sample.
PS: If Ubuntu and/or Debian care about this, the solution is obvious, although it will slow things down somewhat. As always, if you really care about something you must test it and if you don't bother to test it when it's demonstrably a problem, you probably don't actually care about it. This is not a difficult test to automate.
(Also, if debuild
is not what people should be using to build or
rebuild packages these days, various people have at least a
documentation problem.)