In practice, there are multiple namespaces for URLs
In theory, the HTTP and URI/URL standards say that URLs are all in a
single namespace, as opposed to GET
, POST
, etc all using different
URL namespaces, where some URLs only exist for POST
and some only
exist for GET
.
In practice, I believe that web traversal software should behave as if
there were two URL namespaces on websites: one for GET
and HEAD
requests, and a completely independent one for POST
requests.
Crawling software should not issue 'cross-namespace' URL requests,
because you simply can't assume that a URL that is valid in one can
even be used in the other.
This isn't very hard for POST
requests; not much software makes
them, and there's lots of things that make sending useful POST
requests off to URLs you've only seen in GET
contexts difficult. (In
theory you could try converting GET
requests with parameters into
POST
form requests with the same parameters, but I suspect this will
strike people as at least dangerous and questionable.)
Unfortunately I've seen at least one piece of software that went the
other way, issuing GET
requests for URLs that only appeared as the
target of POST
form actions. Since it tried this inside CSpace the
requests went down in flames, because I'm cautious about anything
involving POST
(and I get grumpy when things 'rattle the
doorknobs').
(The crawler in question was called SBIder
, from sitesell.com, and
this behavior is one reason it is now listed in our
robots.txt.)
Comments on this page:
|
|