2013-12-20
A realization: on the modern web, everything gets visited
Once upon a time, a long time ago, you could have public web apps that exposed a few quite slow heavy-weight operations and expect to get away with this because users would only use those operations very occasionally. These might be things like specialized syndication feeds or looking up all resources with a particular label (tag, category, etc). You wouldn't want to be serving those URLs very often, but once in a while was okay and it wasn't worth the complexity of making even the stuff in the corner go fast.
Then the web spiders arrived. These days I automatically assume that
any visible, linked-to URL will get found and crawled by spiders. It
doesn't matter if I mark every link to it nofollow
and annotate it
with a content-type that should be a red flag of 'hands off, nothing
interesting to you here'; at least some spiders will show up anyways.
The result of this is that even things in the corner need to be fast
because while humans may not use them very often, the spiders will.
And there is are a lot of spiders and spider traffic these days (I
remember seeing a recent estimate that over half of web traffic was
from spiders).
(Spiders probably won't visit your really slow corners any more than the rest of your site. But unlike humans they won't necessarily visit them any less. URLs are URLs. And if your slow corners are useful indexes to your content, spiders may actually visit them more. I certainly wouldn't be surprised to find out that modern web crawlers keep track of what pages provide the highest amount of new links or links to changed content on an ongoing basis.)
One more or less corollary of this is that you (or at least I) probably want to plan for new URLs (ie, new features) to be efficient from the start. In the old days you had some degree of ramp up time, where you could deploy an initial slow version, see it get used a bit, tweak it, and so on; these days, well, the spiders are going to be arriving pretty soon.
(I have very direct experience that it doesn't matter how obscure or limited your links are; if links exist in public pages, spiders will find them and begin crawling through them. And one single link to an island of content is enough to start an avalanche of crawling.)
PS: all of this only applies to public web apps and URLs, and so far
only to GET
URLs that are exposed through links in HTML or other
content. Major spiders do not yet stuff random things into GET
-based
forms and submit them to see what happens.