Thinking about the merits of 'universal' URL structures

February 11, 2019

I am reasonably fond of my URLs here on Wandering Thoughts (although I've made a mistake or two in their design), but I have potentially made life more difficult for a future me in how I've designed them. The two difficulties I've given to a future self are that my URLs are bare pages, without any extension on the end of their name, and that displaying some important pages requires a query parameter.

The former is actually quite common out there on the Internet, as many people consider the .html (or .htm) to be ugly and unaesthetic. You can find lots and lots of things that leave off the .html, at this point perhaps more than leave it on. But it does have one drawback, which is that it makes it potentially harder to move your content around. If you use URLs that look like '/a/b/page', you need a web server environment that can serve those as text/html, either by running a server-side app (as I do with DWiki) or by suitable server configuration so that such extension-less files are text/html. Meanwhile, pretty much anything is going to serve a hierarchy of .html files correctly. In that sense, a .html on the end is what I'll call a universal URL structure.

What makes a URL structure universal is that in a pinch, pretty much any web server will do to serve a static version of your files. You don't need the ability to run things on the server and you don't need any power over the server configuration (and thus even if you have the power, you don't have to use it). Did your main web server explode? Well, you can quickly dump a static version of important pages on a secondary server somewhere, bring it up with minimal configuration work, and serve the same URLs. Whatever happens, the odds are good that you can find somewhere to host your content with the same URLs.

I think that right now there are only two such universal URL structures; plain pages with .html on the end, and directories (ie, structuring everything as '/a/b/page/'). The specific mechanisms of giving a directory an index page of some kind will vary, but probably most everything can actually do it.

On the other hand, at this point in the evolution of the web and the Internet in general it doesn't make sense to worry about this. Clever URLs without .html and so on are extremely common, so it seems very likely that you'll always be able to do this without too much work. Maybe one convenient source of publishing your pages won't support it but you'll be able to find another, or easily search for configuration recipes on the web server of your choice for how to do it.

(For example, in doing some casual research for this entry I discovered that Github Pages lets you omit the .html on URLs for things that actually have them in the underlying repository. Github's server side handling of this automatically makes it all work. See this stackoverflow Q&A, and you can test it for yourself on your favorite Github Pages site, eg. I looked at Github Pages because I was thinking of it as an example of almost no effort hosting one might reach for in a pinch, and here it is already supporting what you'd need.)

PS: Having query parameters on your URLs will make your life harder here; you probably need either server side access to something on the order of Apache's RewriteCond or to add some JavaScript into all the relevant pages that will look for any query parameters and do magic things with them that will either provide the right page content or at least redirect to a better URL.

(DWiki has decent reasons for using query parameters, but I feel like perhaps I should have tried harder or been cleverer.)


Comments on this page:

By Simon Tatham at 2019-02-12 05:27:54:

The other advantage of keeping the .html suffix is that if someone downloads some of your pages to look at offline using file:// URLs, they don't have to choose between the browser correctly guessing the file type of the downloaded versions, and the links between them working.

By theamk at 2019-02-12 21:22:04:

I think nowadays, extension-less files are not too bad.

Major servers, apache and nginx, serve them just fine, and I think they can do github's trick with optional extensions as well.

Major hosting providers, at least AWS and GCP, allow you to set content type on per-file basis, independent of the actual filename. And a small provider will likely let you drop your own .htaccess configuration file to achieve that.

Written on 11 February 2019.
« Open protocols can evolve fast if they're willing to break other people
Using grep with /dev/null, an old Unix trick »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Feb 11 23:00:50 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.