What Google Sitemaps isn't

December 24, 2006

The Google Sitemaps XML format has a somewhat underdocumented <priority> field, which is described as:

The priority of this URL relative to other URLs on your site. [...]

The Google documentation is somewhat imprecise here, as there are at least two meanings for 'priority' and they don't really say which one they mean. The first sort of priority is 'which pages do I want crawled first'; the second sort of priority is 'which pages (within my site) do I want ranked first in search results'.

To cut to the chase: Google Sitemaps <priority> is not the relative priority in search results (the second sort of priority). It only seems to influence how Google crawls your site (the first sort of priority), which probably doesn't really matter unless you have a very large site.

This is disappointing, because when Google Sitemaps was first announced I was really hoping it would help me deal with a perpetual problem: I want my individual blog entries ranked higher than my index pages on search results.

The problem is that, unlike normal sites, blogs have a lot of duplicate content, since various sorts of index pages repeat individual entries wholesale. This means a Google search will result in multiple URLs, which Google has to rank somehow, and you would like the URL for the entry itself to rank highest; it's the most stable (there is no guarantee that the index page will still have the same entries as when Google crawled it) and it's got the least distractions to obscure what the user is looking for (on an index page they have to find the right entry).

It would be nice if there was a way of telling Google about this, short of telling it not to index your index pages (which I am leery of). Maybe there is, but if there is it is not the Sitemaps <priority> field.

(Interestingly, Google seems to relatively consistently get this right for some places, such as LiveJournal. I can't help suspecting that they have special tuning for well-known blog sites and blogging packages.)

Of course this hardly matters right now, as Google has been unhappy with CSpace's sitemap for some time now for some mysterious reason. (Yes, I've validated it.)

Written on 24 December 2006.
« Weekly spam summary on December 23rd, 2006
Link: A lovely summary of the XHTML issue »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Dec 24 16:01:36 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.