The mere 'presence' of an URL on a web server is not a good signal

July 5, 2023

There are a variety of situations where you (in the sense of programs and systems) want to know if a web server supports something or is under someone's control. One traditional way is to require the publication of specific URLs on the web server, often URLs with partially random names. The simplest way to implement this is to simply require the URL to exist and be accessible, which is to say that fetching it returns a HTTP 200 response. However, in light of web server implementations which will return HTTP 200 responses for any URL, or at least many of them, this simple check is clearly not sufficient in practice. The mere 'presence' of a URL on a web server proves very little.

If you need to implement this sort of protocol, you need to require the URL to contain some specific contents. Since web servers may echo some or all of the URL and any attribute of the HTTP request into the synthetic page they helpfully generate for you on such phantom URLs, the specific contents you require shouldn't appear in any of those. It's probably safe to deterministically derive them from some of what you send, although the complete independence of URL, HTTP request, and required contents is the best.

I don't think any existing, widely used 'prove something on the web server' protocols uses mere URL presence, so this is an oversight that's more likely to be made in locally developed systems. For example, the ACME TLS certificate issuance protocol requires that some additional data be returned in the response, and I believe it implicitly requires that nothing else be returned (see section 8.1 and section 8.3).

Production web servers that are intended to serve real data are probably not going to be vulnerable to this sort of issue. The danger is more likely to come from systems and devices that are running web servers as an incidental effect of allowing remote management or using HTTP as a communication protocol (as in the case of Prometheus's host agent). However, there are a variety of ways people might be able to exploit such devices in combination with protocols. Plus there are accidents, where some auto-checking program decides that some host has some capability just because it seems to have some URL active.

(This feels obvious now that I've written it out, but until I ran into the issue recently I might have made this mistake if I was designing some sort of simple HTTP probe check.)

Written on 05 July 2023.
« Web servers should refuse requests for random, unnecessary URLs
Basic NFS v4 seems to just work (so far) on Ubuntu 22.04 »

Page tools: View Source.
Search:
Login: Password:

Last modified: Wed Jul 5 21:18:04 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.