== Why web robots sending _Referer_ headers is wrong I've [[written SpiderTechnicalRequirements]] [[before NoRefererForRobots]] on my view that web robots of all sorts should never send a _Referer_ header. In those entries I mostly said 'don't do that' without giving a solid philosophical argument about why, so today I feel like changing that. (Not that a philosophical argument actually matters. Proper behavior on the web is defined by social convention, ie by what lots of other people do and expect, not by arguing with people over what makes sense. Whether or not you agree with a social convention you break it at your peril, and today robots not sending _Referer_ headers is a well established social convention that [[I will ban you for violating DiffbotBadReferer]]. And anyways the people who should read this never will.) There are two philosophical reasons why it's wrong for robots to send _Referer_ headers. The first is inherent in what the _Referer_ header means, namely 'I just followed a link from page '. This is a description of human behavior but not really of robot behavior; almost no web robot actually traverses the web in that way, finding links and immediately following them. If you crawl web pages, accumulate links, and then some time later crawl those links, you are not 'following a link' in any conventional sense. Worse, what happens if you discover the same link through multiple source documents? Which document gets 'credit' and appears in _Referer_? (Yes, yes, this is [[not quite the spec definition http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.36]], which kind of permits the 'I found it here' meaning that robots sometimes use. It is instead the practical definition of the header, as defined by how most everything behaves.) So, you say, you don't care; you want to use _Referer_ as a kind of 'this is what links to you' field for servers. I can summarize a bunch of problems here by saying that the _Referer_ field is a terrible way to communicate this information to web operators, fundamentally because you are trying to use a side effect of HTTP requests to pass on what may be a huge amount of information. If you actually want to be useful you should make this information available on your own web site where people can see and fetch it in bulk. Finally, the brutal truth is that 'who links to me' is by far less interesting than 'who is sending *human* traffic to me (right now)'. By far the most valuable part of _Referer_ is information on where real (human) visitors are coming from, [[to the extent that it's possible to find this out SocialWebHidesDiscussions]]. Being read by people is the ultimate purpose of most web pages, which makes what places are the source of traffic and active links something of decided interest to us. And this sort of human behavior has very little to do with either robot behavior or what potential links exist out there in the world. Mingling either your robot's actions or a 'helpful' attempt to tell us about the latter is not doing us any favours; rather the contrary, in fact (this is one large reason that I react angrily to robots sending _Referer_). (There is also the inconvenient fact that once you're operating a decent sized site you're not likely to really care about who links to you because there will be far too many links out there, most of them in increasingly obscure and unimportant places. The links you do care about are exactly the links that send you significant traffic.)