Wandering Thoughts archives

2022-08-11

My uncertainty over whether an URL format is actually legal

I was recently dealing with a program that runs in a configuration that sometimes misbehaves when you ask it to create and display a link to a relative URL like '/'. My vague memory suggested an alternative version of the URL that might make the program leave it alone, one with a schema but no host, so I tried 'https:/' and it worked. Then I tried to find out if this is actually a proper legal URL format, as opposed to one that browsers just make work, and now I'm confused and uncertain.

The first relatively definite thing that I learned is that file URLs don't need all of those slashes; a URL of 'file:/tmp' is perfectly valid and is interpreted the way you'd expect. This is suggestive but not definite, since the "file" URL scheme is a pretty peculiar thing.

An absolute URL can leave out the scheme; '//mozilla.org/' is a valid URL that means 'the root of mozilla.org in whichever of HTTP and HTTPS you're currently using' (cf). Wikipedia's section on the syntax of URLs claims that the authority section is optional. The Whatwg specification's section on URL writing requires anything starting with 'http:' and 'https:' to be written with the host (because scheme relative special URL strings require a host). This also matches the MDN description. I think this means that my 'https:/path' trick is not technically legal, even if it works in many browsers.

Pragmatically, Firefox, Chrome, Konqueror, and Lynx (all on Linux) support this, but Links doesn't (people are extremely unlikely to use Lynx or Links with this program, of course). Safari on iOS also supports this, which is the extent of my easy testing. Since Chrome on Linux works, I assume that Chrome on other platforms, including Android, will; similarly I assume desktop Safari on macOS will work, and Firefox on Windows and macOS.

(I turned to specifications because I'm not clever enough at Internet search terms to come up with a search that wasn't far, far too noisy.)

PS: When I thought that 'https:/path' might be legal, I wondered if ':/path' was also legal (with the meaning of 'the current scheme, on the current host, but definitely an absolute path'). But that's likely more not lega than 'https:/path' and probably less well supported; I haven't even tried testing it.

Sidebar: Why I care about such an odd URL

The obvious way to solve this problem would just be to put the host in the URL. However, this would get in the way of how I test new versions of the program in question, where I really do want a URL that means 'the root of the web server on whatever website this is running on'. Yes, I know, that should be '/', but see above about something mis-handling this sometimes in our configuration.

(I don't think it's Apache's ProxyPassReverse directive, because the URL is transformed in the HTML, and PPR doesn't touch that.)

web/URLFormatLegalUncertainty written at 23:50:07; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.