2006-07-11
Why nofollow
is useful and important
No less a person than Google's Matt Cutts
recently spoke up about herding Googlebot and
more or less recommend using the noindex
meta tag on pages instead of
nofollow
on links to them (on the grounds that it's more of a sure
thing to mark pages noindex than to make sure that all links are marked
nofollow).
I must respectfully disagree with this, because in one important
respect meta noindex
isn't good enough. The big thing that nofollow
does that meta noindex
can't do is that it makes good web spiders not fetch the target page at all. Which means
that you didn't have to send it, and for dynamic pages that you
didn't have to generate it.
(This is especially important for heavily dynamic websites that have a lot of automatically generated index pages of various sorts.)
I really don't want to be burning my CPU cycles to generate pages
that web spiders will just throw away again; frankly, it's annoying
as well as wasteful. This is a good part of why I am so twitchy about
spiders respecting nofollow
.
(In fact I care more about this than about helping Google reduce
redundancy in their indexes, which is one reason why WanderingThoughts
has lots of nofollow
but no meta noindex
. Plus, getting good
indexing for a blog-oid thing is much harder than just sprinkling some
noindex
magic over bits.)
Sidebar: why not robots.txt
?
In theory, robots.txt
is supposed to be the way to tell web spiders to
avoid URLs entirely. However, there are two problems with in practice.
First, the format itself is inadequate for anything except blocking
entire directory hierarchies. Second, it's the wrong place; the only
thing that really knows whether a page should be spidered is the thing
generating the page.