Wandering Thoughts archives

2025-03-13

Doing multi-tag matching through URLs on the modern web

So what happened is that Mike Hoye had a question about a perfectly reasonable ideas:

Question: is there wiki software out there that handles tags (date, word) with a reasonably graceful URL approach?

As in, site/wiki/2020/01 would give me all the pages tagged as 2020 and 01, site/wiki/foo/bar would give me a list of articles tagged foo and bar.

I got nerd-sniped by a side question but then, because I'd been nerd-sniped, I started thinking about the whole thing and it got more and more hair-raising as a thing done in practice.

This isn't because the idea of stacking selections like this is bad; 'site/wiki/foo/bar' is a perfectly reasonable and good way to express 'a list of articles tagged foo and bar'. Instead, it's because of how everything on the modern web eventually gets visited combined with how, in the natural state of this feature, 'site/wiki/bar/foo' is just a valid a URL for 'articles tagged both foo and bar'.

The combination, plus the increasing tendency of things on the modern web to rattle every available doorknob just to see what happens, means that even if you don't advertise 'bar/foo', sooner or later things are going to try it. And if you do make the combinations discoverable through HTML links, crawlers will find them very fast. At a minimum this means crawlers will see a lot of essentially duplicated content, and you'll have to go through all of the work to do the searches and generate the page listings and so on.

If I was going to implement something like this, I would define a canonical tag order and then, as early in request processing as possible, generate a HTTP redirect from any non-canonical ordering to the canonical one. I wouldn't bother checking if the tags were existed or anything, just determine that they are tags, put them in canonical order, and if the request order wasn't canonical, redirect. That way at least all of your work (and all of the crawler attention) is directed at one canonical version. Smart crawlers will notice that this is a redirect to something they already have (and hopefully not re-request it), and you can more easily use caching.

(And if search engines still matter, the search engines will see only your canonical version.)

This probably holds just as true for doing this sort of tag search through query parameters on GET queries; if you expose the result in a URL, you want to canonicalize it. However, GET query parameters are probably somewhat safer if you force people to form them manually and don't expose links to them. So far, web crawlers seem less likely to monkey around with query parameters than with URLs, based on my limited experience with the blog.

TagsViaURLsAndModernWeb written at 22:46:44;

2025-03-11

Some views on the common Apache modules for SAML or OIDC authentication

Suppose that you want to restrict access to parts of your Apache based website but you want something more sophisticated and modern than Apache Basic HTTP authentication. The traditional reason for this was to support 'single sign on' across all your (internal) websites; the modern reason is that a central authentication server is the easiest place to add full multi-factor authentication. The two dominant protocols for this are SAML and OIDC. There are commonly available Apache authentication modules for both protocols, in the form of Mellon (also) for SAML and OpenIDC for OIDC.

I've now used or at least tested the Ubuntu 24.04 version of both modules against the same SAML/OIDC identity provider, primarily because when you're setting up a SAML/OIDC IdP you need to be able to test it with something. Both modules work fine, but after my experiences I'm more likely to use OpenIDC than Mellon in most situations.

Mellon has two drawbacks and two potential advantages. The first drawback is that setting up a Mellon client ('SP') is more involved. Most of annoying stuff is automated for you with the mellon_create_metadata script (which you can get from the Mellon repository if it's not in your Mellon package), but you still have to give your IdP your XML blob and get their XML blob. The other drawback is that Mellon isn't integrated into the Apache 'Require' framework for authorization decisions; instead you have to make do with Mellon-specific directives.

The first potential advantage is that Mellon has a straightforward story for protecting two different areas of your website with two different IdPs, if you need to do that for some reason; you can just configure them in separate <Location> or <Directory> blocks and everything works out. If anything, it's a bit non-obvious how to protect various disconnected bits of your URL space with the same IdP without having to configure multiple SPs, one for each protected section of URL space. The second potential advantage is that in general SAML has an easier story for your IdP giving you random information, and Mellon will happily export every SAML attribute it gets into the environment your CGI or web application gets.

The first advantage of OpenIDC is that it's straightforward to configure when you have a single IdP, with no XML and generally low complexity. It's also straightforward to protect multiple disconnected URL areas with the same IdP but possibly different access restrictions. A third advantage is that OpenIDC is integrated into Apache's 'Require' system, although you have to use OpenIDC specific syntax like 'Require claim groups:agroup' (see the OpenIDC wiki on authorization).

In exchange for this, it seems to be quite involved to use OpenIDC if you need to use multiple OIDC identity providers to protect different bits of your website. It's apparently possible to do this in the same virtual host but it seems quite complex and requires a lot of parts, so if I was confronted with this problem I would try very hard to confine each web thing that needed a different IdP into a different virtual host. And OpenIDC has the general OIDC problem that it's harder to expose random information.

(All of the important OpenIDC Apache directives about picking an IdP can't be put in <Location> or <Directory> blocks, only in a virtual host as a whole. If you care about this, see the wiki on Multiple Providers and also access to different URL paths on a per-provider basis.)

We're very likely to only ever be working with a single IdP, so for us OpenIDC is likely to be easier, although not hugely so.

Sidebar: The easy approach for group based access control with either

Both Mellon and OpenIDC work fine together with the traditional Apache AuthGroupFile directive, provided (of course) that you have or build an Apache format group file using what you've told Mellon or OpenIDC to use as the 'user' for Apache authentication. If your IdP is using the same user (and group) information as your regular system is, then you may well already have this information around.

(This is especially likely if you're migrating from Apache Basic HTTP authentication, where you already needed to build this sort of stuff.)

Building your own Apache group file has the additional benefit that you can augment and manipulate group information in ways that might not fit well into your IdP. Your IdP has the drawback that it has to be general; your generated Apache group file can be narrowly specific for the needs of a particular web area.

ApacheMellonAndOpenIDCViews written at 23:01:54;

2025-03-10

The web browser as an enabler of minority platforms

Recently, I got involved in a discussion on the Fediverse over what I will simplify to the desirability (or lack of it) of cross platform toolkits, including the browser, and how they erase platform personality and opinions. This caused me to have a realization about what web browser based applications are doing for me, which is that being browser based is what lets me use them at all.

My environment is pretty far from being a significant platform; I think Unix desktop share is in the low single percent under the best of circumstances. If people had to develop platform specific versions of things like Grafana (which is a great application), they'd probably exist for Windows, maybe macOS, and at the outside, tablets (some applications would definitely exist on phones, but Grafana is a bit of a stretch). They probably wouldn't exist on Linux, especially not for free.

That the web browser is a cross platform environment means that I get these applications (including the Fediverse itself) essentially 'for free' (which is to say, it's because of the efforts of web browsers to support my platform and then give me their work for free). Developers of web applications don't have to do anything to make them work for me, not even so far as making it possible to build their software on Linux; it just happens for them without them even having to think about it.

Although I don't work in the browser as much as some people do, looking back the existence of implicitly cross platform web applications has been a reasonably important thing in letting me stick with Linux.

This applies to any minority platform, not just Linux. All you need is a sufficiently capable browser and you have access to a huge range of (web) applications.

(Getting that sufficiently capable browser can be a challenge on a sufficiently minority platform, especially if you're not on a major architecture. I'm lucky in that x86 Linux is a majority minority platform; people on FreeBSD or people on architectures other than x86 and 64-bit ARM may be less happy with the situation.)

PS: I don't know if what we have used the web for really counts as 'applications', since they're mostly HTML form based things once you peel a few covers off. But if they do count, the web has been critical in letting us provide them to people. We definitely couldn't have built local application versions of them for all of the platforms that people here use.

(I'm sure this isn't a novel thought, but the realization struck (or re-struck) me recently so I'm writing it down.)

BrowserEnablesMinorityPlatforms written at 23:35:34;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.