Wandering Thoughts archives

2006-07-11

Why nofollow is useful and important

No less a person than Google's Matt Cutts recently spoke up about herding Googlebot and more or less recommend using the noindex meta tag on pages instead of nofollow on links to them (on the grounds that it's more of a sure thing to mark pages noindex than to make sure that all links are marked nofollow).

I must respectfully disagree with this, because in one important respect meta noindex isn't good enough. The big thing that nofollow does that meta noindex can't do is that it makes good web spiders not fetch the target page at all. Which means that you didn't have to send it, and for dynamic pages that you didn't have to generate it.

(This is especially important for heavily dynamic websites that have a lot of automatically generated index pages of various sorts.)

I really don't want to be burning my CPU cycles to generate pages that web spiders will just throw away again; frankly, it's annoying as well as wasteful. This is a good part of why I am so twitchy about spiders respecting nofollow.

(In fact I care more about this than about helping Google reduce redundancy in their indexes, which is one reason why WanderingThoughts has lots of nofollow but no meta noindex. Plus, getting good indexing for a blog-oid thing is much harder than just sprinkling some noindex magic over bits.)

Sidebar: why not robots.txt?

In theory, robots.txt is supposed to be the way to tell web spiders to avoid URLs entirely. However, there are two problems with in practice. First, the format itself is inadequate for anything except blocking entire directory hierarchies. Second, it's the wrong place; the only thing that really knows whether a page should be spidered is the thing generating the page.

web/UsefulNofollow written at 02:13:01; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.