Wandering Thoughts archives


Some things I mean when I talk about 'forged HTTP referers'

One of the most reliable and often the fastest ways to get me to block people from Wandering Thoughts is to do something that causes my logs to become noisy or useless. One of those things is persistently making requests with inaccurate Referer headers, because I look at my Referer logs on a regular basis. When I talk about this, I'll often use the term 'forged' here, as in 'forged referers' or 'referer-forging web spider'.

(I've been grumpy about this for a long time.)

I have casually used the term 'inaccurate' up there, as well as the strong term 'forged'. But given that the Referer header is informational, explicitly comes with no guarantees, and is fully under the control of the client, what does that really mean? As I to use it, I tend have one of three different meanings in mind.

First, let's say what an accurate referer header is: it's when the referer header value is an honest and accurate representation of what happened. Namely, a human being was on the URL in the Referer header and clicked on a link that sent them to my page, or on the site if you only put the site in the Referer. A blank Referer header is always acceptable, as are at least some Referer headers that aren't URLs if they honestly represent what a human did to wind up on my page.

An inaccurate Referer in the broad sense is any Referer that isn't accurate. There are at least two ways for it to be inaccurate (even if it is a human action). The lesser inaccuracy is if the source URL contains a link to my page, but it doesn't actually represent how the human wound up on my page, it's just a (random) plausible value. Such referers are inaccurate now but could be accurate in another circumstances. The greater inaccuracy is if the source URL doesn't even link to my page, so it would never be possible for the Referer to be accurate. Completely bogus referers are usually more irritating than semi-bogus referers, although this is partly a taste issue (both are irritating, honestly, but one shows you're at least trying).

(I'd like better terms for these two sorts of referers; 'bogus' and 'plausible' are the best I've come up with so far.)

As noted, I will generally call both of these cases 'forged', not just 'inaccurate'. Due to my view that Referer is a human only header, I use 'forged' for basically all referers that are provided by web spiders and the like. I can imagine circumstances when I'd call Referer headers sent by a robot as merely 'inaccurate', but they'd be pretty far out and I don't think I've ever run into them.

The third case and the strongest sense of 'forged' for me is when the Referer header has clearly been selected because the web spider is up to no good. One form of this is Referer spamming (which seems to have died out these days, thankfully). Another form is when whatever is behind the requests looks like it's deliberately picking Referer values to try to evade any security precautions that might be there. A third form is when your software uses the Referer field to advertise yourself in some way, instead of leaving this to the User-Agent field (which has happened, although I don't think I've seen it recently).

(Checking for appropriate Referer values is a weak security precaution that's easy to bypass and not necessarily a good idea, but like most weak security precautions it does have the virtue of making it pretty clear when people are deliberately trying to get around it.)

PS: Similar things apply when I talk about 'forged' other fields, especially User-Agent. Roughly speaking, I'll definitely call your U-A forged if you aren't human and it misleads about what you are. If you're a real human operating a real browser, I consider it your right to use whatever U-A you want to, including completely misleading ones. Since I'm human and inconsistent, I may still call it 'forged' in casual conversation for convenience.

web/ForgedRefererMyMeanings written at 23:30:55; Add Comment

The lie in Ubuntu source packages (and probably Debian ones as well)

I tweeted:

One of the things that pisses me off about the Debian and Ubuntu source package format is that people clearly do not actually use it to build packages; they use other tools. You can tell because of how things are broken.

(I may have been hasty in tarring Debian with this particular brush but it definitely applies to Ubuntu.)

Several years ago I wrote about one problem with how Debian builds from source packages, which is that it doesn't have a distinction between the package's source tree and the tree that the package is built in and as a result building the package can contaminate the source tree. This is not just a theoretical concern; it's happened to us. In fact it's now happened with both the Ubuntu 14.04 version of the package and then the Ubuntu 16.04 version, which was contaminated in a different way this time.

This problem is not difficult to find or notice. All you have to do is run debuild twice in the package's source tree and the second one will error out. People who are developing and testing package changes should be doing this all the time, as they build and test scratch versions of their package to make sure that it actually has what they want, passes package lint checks, and so on.

Ubuntu didn't find this issue, or if they found it they didn't care enough to fix it. The conclusion is inescapable; the source package and all of the documentation that tells you to use debuild on it is a lie. The nominal source package may contain the source code that went into the binary package (although I'm not sure you can be sure of that), but it's not necessarily an honest representation of how the package is actually built by the people who work on it and as a result building the package with debuild may or may not reproduce the binary package you got from Ubuntu. Certainly you can't reliably use the source package to develop new versions of the binary package; one way or another, you will have to use some sort of hack workaround.

(RPM based distributions should not feel too smug here, because they have their own package building issues and documentation problems.)

I don't build many Ubuntu packages. That I've stumbled over two packages out of the few that I've tried to rebuild and they're broken in two different ways strongly suggests to me that this is pretty common. I could be unlucky (or lucky), but I think it's more likely that I'm getting a reasonably representative random sample.

PS: If Ubuntu and/or Debian care about this, the solution is obvious, although it will slow things down somewhat. As always, if you really care about something you must test it and if you don't bother to test it when it's demonstrably a problem, you probably don't actually care about it. This is not a difficult test to automate.

(Also, if debuild is not what people should be using to build or rebuild packages these days, various people have at least a documentation problem.)

linux/UbuntuPackageBuildingLie written at 01:43:26; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.