My uncertainty over whether an URL format is actually legal

August 11, 2022

I was recently dealing with a program that runs in a configuration that sometimes misbehaves when you ask it to create and display a link to a relative URL like '/'. My vague memory suggested an alternative version of the URL that might make the program leave it alone, one with a schema but no host, so I tried 'https:/' and it worked. Then I tried to find out if this is actually a proper legal URL format, as opposed to one that browsers just make work, and now I'm confused and uncertain.

The first relatively definite thing that I learned is that file URLs don't need all of those slashes; a URL of 'file:/tmp' is perfectly valid and is interpreted the way you'd expect. This is suggestive but not definite, since the "file" URL scheme is a pretty peculiar thing.

An absolute URL can leave out the scheme; '//mozilla.org/' is a valid URL that means 'the root of mozilla.org in whichever of HTTP and HTTPS you're currently using' (cf). Wikipedia's section on the syntax of URLs claims that the authority section is optional. The Whatwg specification's section on URL writing requires anything starting with 'http:' and 'https:' to be written with the host (because scheme relative special URL strings require a host). This also matches the MDN description. I think this means that my 'https:/path' trick is not technically legal, even if it works in many browsers.

Pragmatically, Firefox, Chrome, Konqueror, and Lynx (all on Linux) support this, but Links doesn't (people are extremely unlikely to use Lynx or Links with this program, of course). Safari on iOS also supports this, which is the extent of my easy testing. Since Chrome on Linux works, I assume that Chrome on other platforms, including Android, will; similarly I assume desktop Safari on macOS will work, and Firefox on Windows and macOS.

(I turned to specifications because I'm not clever enough at Internet search terms to come up with a search that wasn't far, far too noisy.)

PS: When I thought that 'https:/path' might be legal, I wondered if ':/path' was also legal (with the meaning of 'the current scheme, on the current host, but definitely an absolute path'). But that's likely more not lega than 'https:/path' and probably less well supported; I haven't even tried testing it.

Sidebar: Why I care about such an odd URL

The obvious way to solve this problem would just be to put the host in the URL. However, this would get in the way of how I test new versions of the program in question, where I really do want a URL that means 'the root of the web server on whatever website this is running on'. Yes, I know, that should be '/', but see above about something mis-handling this sometimes in our configuration.

(I don't think it's Apache's ProxyPassReverse directive, because the URL is transformed in the HTML, and PPR doesn't touch that.)


Comments on this page:

From 193.219.181.242 at 2022-08-12 00:46:11:

If a browser accepts http:/foo, I'd assume it's auto-correcting that into http://foo as part of correcting common mistypings (like how browsers convert backslashes such as http:\\foo into forward-slashes).

Outside of that, "http" URLs with a schema but without an authority don't really make much logical sense, as you talk HTTP to some server, unlike "file" URLs which are by default local anyway.

PS: When I thought that 'https:/path' might be legal, I wondered if ':/path' was also legal (with the meaning of 'the current scheme, on the current host, but definitely an absolute path'). But that's likely more not lega than 'https:/path' and probably less well supported; I haven't even tried testing it.

Scheme-relative URLs are legal, but the colon is part of scheme specification, so if you omit the scheme you have to omit the colon as well.

For example, <a href="//example.com/foo"> will use the "current" scheme of the webpage it's found in – those were used very often in the "HTTPS transition" era (i.e. when many sites had HTTPS available but not yet made it mandatory).

An URL with the current scheme and the current host would simply be /foo – that is definitely an absolute path. Anything that treats it as a relative path is just odd.

If Grafana doesn't access the URL on the server, I think devproxy could be a solution? It lets me use a production URL to point at a dev copy of the code on a virtual machine. The browser asks the proxy to connect by name, and the proxy either connects to an IP/port configured for the domain, or passes it through to the real host.

Apache on my dev machine uses the ServerName of the production host, and a Let's Encrypt certificate, so that self-referential URLs from the server are the production URLs. The final piece is the Proxy Switcher and Manager extension for Firefox, which allows choosing between the proxy or the real Internet to switch between sites.

devproxy was the original version, which runs in GOPATH and uses a compiled-in set of rules to configure how domains should be accessed. devproxy2 uses Go modules and a TOML configuration file instead.

Ah, looks like devproxy is the same idea I came up with for $work. It’s very nice in that I never need to edit URLs logged in an issue in order to check them on my dev instance. Just by switching the proxy on and off I can toggle between my local dev instance and the production site in the same browsing session.

Note that it isn’t necessary to use the production SSL cert to make this work. You can locally create a CA cert your machine trusts and use that to sign a site cert for the dev instance. And there are tools like mkcert which automate the entire process for you.

Written on 11 August 2022.
« Some notes (to myself) about formatting text in jq
My adventure with URLs in a Grafana that's behind a reverse proxy »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Aug 11 23:50:07 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.