Why nofollow is useful and important

No less a person than Google's Matt Cutts recently spoke up about herding Googlebot and more or less recommend using the noindex meta tag on pages instead of nofollow on links to them (on the grounds that it's more of a sure thing to mark pages noindex than to make sure that all links are marked nofollow).

I must respectfully disagree with this, because in one important respect meta noindex isn't good enough. The big thing that nofollow does that meta noindex can't do is that it makes good web spiders not fetch the target page at all. Which means that you didn't have to send it, and for dynamic pages that you didn't have to generate it.

(This is especially important for heavily dynamic websites that have a lot of automatically generated index pages of various sorts.)

I really don't want to be burning my CPU cycles to generate pages that web spiders will just throw away again; frankly, it's annoying as well as wasteful. This is a good part of why I am so twitchy about spiders respecting nofollow.

(In fact I care more about this than about helping Google reduce redundancy in their indexes, which is one reason why WanderingThoughts has lots of nofollow but no meta noindex. Plus, getting good indexing for a blog-oid thing is much harder than just sprinkling some noindex magic over bits.)

Sidebar: why not robots.txt?

In theory, robots.txt is supposed to be the way to tell web spiders to avoid URLs entirely. However, there are two problems with in practice. First, the format itself is inadequate for anything except blocking entire directory hierarchies. Second, it's the wrong place; the only thing that really knows whether a page should be spidered is the thing generating the page.

These are my WanderingThoughts
(About the blog)

GettingAround
Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.

* * *

Atom feeds are available; see the bottom of most pages.

This is a DWiki.
(Help)

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

Search:
Written on 11 July 2006.
(Previous | Next)

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jul 11 02:13:01 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.