The advantage of having an (XML) sitemap

April 11, 2009

I've had a sitemap for a fairly long time (long enough that the format changed out from under me a few times). In theory I created it for Google, as a way of steering them around WanderingThoughts (and CSpace as a whole), but I've never been sure if Google was really getting anything out of it. As it turns out, that doesn't really matter, because having a sitemap has turned out to be really useful for me.

Why it's so useful is neatly encapsulated in what it is; at least for me, a sitemap is an automatically generated, easily parsed list of all of the important URLs in my dynamically generated website. This is a great thing to have to feed to various sorts of testing systems; for example, if I change the code that converts wikitext into HTML, I test it by rendering all of the URLs in my sitemap with both the old and the new code and looking for differences.

(Using auto-generated sitemaps for testing is a great incentive to make sure that they include all of your important pages. For example, I had to do some tweaks to my initial sitemap generator to make sure that it included URLs that would show all comments. This is good for me and, if Google is paying attention, is good for Google too.)

The one thing I wish for with sitemaps is an autodiscovery protocol that did not involve robots.txt and instead was more like syndication feed autodiscovery (which uses magic things in the <head> section of ordinary HTML pages). Editing a single global file like robots.txt is simply not scalable if you have lots of sub-sites, each of which will generate their own sitemap, and people can create such sub-sites on their own.

(Translation: I do not want to be editing our robots.txt each time a user adds a sitemap to their home page or to some sub-area of their home page, or removes such a sub-area that they decided they didn't want any more, or changes software in a way that changes the sitemap URL, or etc etc etc.)

Written on 11 April 2009.
« Why 'sender stores message' is easier for spammers than real mail servers
A hairshirt too far: on always avoiding CSS »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Apr 11 01:39:46 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.