Wandering Thoughts

2022-05-18

Missing TLS intermediate certificates can create mysterious browser problems

Today I wound up helping someone with a weird and mysterious browser problem, where they couldn't connect to a particular HTTPS site with either Chrome and Microsoft Edge on their up to date Windows machine; when they attempted to do so, Chrome (and Edge) reported that the TLS certificate was issued by an unknown Certificate Authority. Firefox on their machine could connect and I could connect with various browsers (including from Chrome and Microsoft Edge on some Windows 10 machines I have access to). Given the title of this entry, you already know the ultimate cause; the website's TLS certificate was signed by an intermediate certificate that the website wasn't serving.

With a missing intermediate certificate, the website's TLS certificate looked like it was signed by an unknown CA, that being the "CA" of the name of the intermediate certificate (although neither Chrome nor Microsoft Edge reported the name of the unknown CA, probably for good reasons). The website worked in other browsers because browsers silently cache intermediate TLS certificates that they've seen before, in addition to being willing to download them under the right circumstances. The person who had this problem almost never uses Chrome or Microsoft Edge (they mostly use Firefox), so those browsers had never had the chance to see this particular intermediate certificate before.

(To make things worse, I believe that this particular intermediate certificate is deprecated, so you won't find many sites still using it and providing it to you.)

This is a hard problem for a lot of people to even see, much less identify. The browser's intermediate certificate cache is basically invisible and I think most browsers provide no way to clear the cache or to explicitly check your site with an empty one (and Firefox actively pre-populates its cache, which includes this particular intermediate certificate). If you're a website developer or a system administrator and you check such a site, it's very likely that your regular browsers will have the necessary intermediate certificate cached. In Firefox, I think you have to test your website with a new scratch profile (and do it immediately after creating the profile before Firefox starts downloading intermediates on you).

Figuring out the actual problem took me rather a long time. First I spent a while trying some browsers in various environments, and then I thought that Windows on the machine might have an incomplete or damaged root certificate store. Only when I started peering very closely at Chrome's view of certificates on another Windows machine did I notice that the particular certificate was an intermediate certificate, not a CA certificate, and the penny dropped. I'm not sure a non-specialist could ever have diagnosed the problem, and even diagnosing the problem didn't make it easy to fix.

(The website's TLS certificate included an 'issuing certificate URL' (from RFC 5280 section 4.2.2.1), so a browser could have distinguished this from a completely unknown certificate authority if it wanted to.)

In retrospect, there was an important clue that I ignored and some things that I should have done. The important clue was that certigo, my usual tool for dumping certificate information, reported that it couldn't verify the certificate chain; at the time I wrote this off as 'maybe Fedora's root certificate list is odd', but I shouldn't have. The obvious thing I should have done was immediately run the site through the SSL Server Test, which would have immediately pointed out the problem. I could also have tried the site in a new scratch Firefox profile on my machine, which would probably also have pointed out the issue.

(I was blinded mostly because I was thinking of this as a client problem instead of a server one. Partly this is my Windows biases showing.)

In many environments, command line tools are useful for diagnosing this sort of thing because they (mostly) don't download or cache intermediate certificates. Instead, most of them verify the server's TLS certificate purely from the certificates provided in the TLS connection and the local CA trust store (which may include intermediate certificates but usually doesn't). If your browser works but something like certigo complains, generally either there's a missing intermediate or your system CA trust store is incomplete. I need to remember that for the next time around.

PS: At one point Firefox had an about:config preference that could be used to disable its cache of intermediate certificates, among other things, but that preference seems to have vanished in code rewrites since then.

TLSIntermediateCertHell written at 21:00:55; Add Comment

2022-05-14

The web is, in a sense, designed for serving static files

One thought I've been turning over in my mind lately is the idea that one of the reasons that it's historically been so easy to serve static files on the web is that in a sense, the web was designed for it. This was not so much through calculation as through necessity, because most or all of the early web servers served static files, especially the very first web server. This likely created a strong pressure to make HTTP friendly to static files, since anything else would require a more complicated web server for no practical gain.

(Or they almost entirely served static files. There is a provision in the very first version of HTTP for searching an index, with a presumably dynamic result generated by a program.)

The obvious area where this shows is that URL paths map directly on to Unix file paths. When I write it this way using web terms it sounds natural, but in fact the 'URL path' is really a per-server identifier for the particular page. There are a lot of ways to index and address content other than hierarchical paths, but the web picked paths instead of some other identifier, despite them not always being a good way to identify content in practice (just ask the people who've had to decide on what the proper URL structure is for a REST application).

(There are even other plausible textual representations, although paths are in some sense the most concise one. I prefer not to think about a version of the web that used something like X.5xx Distinguished Names to identify pages.)

The very first HTTP protocol is startlingly limited, although Basic HTTP followed very soon afterward with niceties like being able to return content types other than HTML. But it does have the idea of dynamic content in the form of searching on a named index, which feels explicitly designed for the index to be a program that performs the search and returns (HTML) results.

Early HTTP and HTML is so minimal that I'm not sure I could point to anything else that's biased in favor of static files. Arguably the biases are in what's left out; for example, there's no indicator to client programs that what they've fetched should be refreshed every so often. Early HTTP left that up to the users (and then Netscape added it around 1995, so people did eventually want this).

It's entirely possible that my thought here is well off the mark. Certainly if you're on a Unix workstation and designing a protocol that serves multiple pieces of content that are distinguished somehow, it's natural to use paths to specify content. As support for this naturalness, consider Gopher, which also used "a file-like hierarchical arrangement that would be familiar to users" despite having significantly different navigation.

WebDesignedForStaticServing written at 22:43:06; Add Comment

2022-05-11

Traditionally there are fewer steps in setting up a static website

It's common to say that it's "easier" to set up a static website than a dynamic one, although people usually don't try too hard to define what is easier about it. I've come to think that one reason for this perception is that historically, setting up a static website has effectively been a subset of setting up a dynamic website. One reason for this is the same fundamental dynamic that has made serving static files efficient, which is that historically most dynamic websites had some static components.

Since dynamic websites had static components, general web servers supported serving static files essentially by default even if they were partially oriented toward dynamic sites. If you were setting up a dynamic site, you set up your web server and then configured both the static side (for your static assets) and the dynamic side (for your dynamic application). If you were setting up a static site, you set up the general web server, set up the static side, and stopped. Often the setup for the static side was quite simple (as opposed to the general web server setup).

(Traditional general purpose web servers also usually made it easier to set up the static side than the dynamic one, so it was less work to set up a purely static website with them than to set up a purely dynamic one. Often the static side essentially worked by default while you had a lot of fiddling for the dynamic side.)

My impression is that this is somewhat changing today, with an increasing number of (dynamic) application servers that can be put directly on the Internet (or behind a proxying CDN like Cloudflare). These application servers don't need any additional configuration to do dynamic things; they're dynamic out of the box (and often support some static file serving too, although it's not necessarily their focus). The extreme version of this is various 'serverless' environments that are entirely dynamic.

(Serverless environments are often coupled to a separate way of serving static assets, but it's separate; if you want to serve your static assets through the same interface as your dynamic side, I think you generally have to wrap them up in a dynamic container.)

PS: To take a thought from The Demise of the Mildly Dynamic Website (via), one part of the power of the old fashioned Apache and PHP (and sometimes Server Side Includes) environment was how seamlessly you could move from static to dynamic content and how easy it was to set that up. In simple setups, you didn't really have to do anything globally (beyond enabling PHP); instead you could decide what was static and what was dynamic on a page by page basis with no global configuration necessary.

StaticWebsiteFewerSteps written at 21:42:43; Add Comment

2022-04-30

Some thoughts about your (our) site needing Javascript

Yesterday, I mentioned that our support site uses a little bit of Javascript to inject site-wide navigation on all the pages. In a comment, msi noted:

But then, that also means that people who block JavaScript don't get to see the navigation bar, which kind of defeats the point of the entire site, I guess.

On the one hand, I feel this. I block essentially all Javascript myself, so I had to add a specific exemption for our support site, and I'm always happy to read about the various catalogs of reasons to avoid Javascript beyond that people have it blocked (for example, via). But on the other hand, when one of my co-workers decided to use Javascript to insert this navigation, I didn't even really consider raising objections, for two reasons that combine together.

First, our support website is what I call an inward facing website, one aimed only at our limited number of users instead of a broad worldwide audience. One of the effects of this is that generally speaking, we can assume that people are coming to it from a good network environment and with reasonably capable clients, where they probably aren't suffering from any number of issues that can affect delivering and running our Javascript. This means that if you aren't running our Javascript it's mostly likely a deliberate choice on your part, as it is for me in my normal browser environment.

Second, the reality today is that voluntarily choosing not to run Javascript leaves you with all sorts of problems in general; our site not having general navigation working is the least of your issues. Anyone who voluntarily disables Javascript rapidly becomes an expert in dealing with Javascript related problems. Such people are expert enough to look at our support site and sort out the navigation problem. They're also very uncommon for exactly this reason; it's hard work to live in a web world that lacks Javascript. My co-workers are unlikely to be sympathetic to an argument that we should go significantly out of our way (for example to a scheme where we have to pre-process all the HTML pages) in order to accommodate such people.

The (unfortunate) reality of life is that Javascript working is the default, by a huge margin, and when it doesn't work many people can fix the problem with a page refresh. For a small scale, low effort website this makes it hard to argue against small amounts of simple Javascript that do basic things. I feel that you pretty much have to make an argument based on the principle of it, rather than the practicality, and this is only a winning argument with people who are already sympathetic, which most people aren't.

If you have a site that serves a wide range of user needs around the world (whether it's large or small), then all of the standard arguments against relying on Javascript apply and it would be great if you avoided it unless you absolutely have to have it (I'll certainly be happy). If your audience is unusual, you may also have good reason to avoid Javascript. But otherwise, it's a hard argument, especially for what are essentially extras.

(Although I haven't checked recently, I think you can mostly navigate through our support site without needing our navigation bar. Most or all of the pages are linked, directly or indirectly, from the main page's normal content, which doesn't require any Javascript to see. There would be a much better argument to be had if people wanted to rely on Javascript even for main content, because that would be much more dangerous.)

OnNeedingJavascript written at 21:46:41; Add Comment

2022-04-29

Our positive experience with having our support site be basic HTML

About a decade ago, we wound up caught in a wiki trap, where our support site was stuck using what had become an unsupported piece of wiki software with an unsupported wikitext dialect and no automated migration path. Our way out was not a different wiki software but instead to scrape the HTML of the existing site and then redo the entire site as plain HTML. You might wonder how that has worked out for us over the time since and if we regret our decision. The short answer is that it seems to have worked well, but there are probably some specific circumstances involved.

Since the initial brute force conversion, the site has gone through one reskinning and simplification of its design. On the server, the pages are all static files; the redesigned site uses a tiny bit of Javascript (and jQuery) to inject a site-wide navigation bar on every page, because that way the navigation lives in only a single file for the entire site. When we need to update something we use our editor of choice, usually vi(m). When we make new pages, I think people generally copy an existing page and then gut the contents in order to leave only the standard structure. We don't try for semantic HTML or anything like that.

(I wrote a bit more about this in my entry on the appeal of using plain HTML pages.)

This approach has been quite successful for us, but I feel there are specific local reasons for that. Two of them are that we don't have much content and we don't change it much. Our overall systems and how people use them don't change radically very often, so our support site needs very few updates. Some of the lack of change is because we know we can't do a good job of keeping very specific instructions up to date, so we mostly don't write them in the first place. People with more capacity to keep documentation up to date might have beautiful specific instructions on how to do all sorts of things on every client operating system they support, with lots of screenshots; we don't even try.

(We can sort of get away with this on the grounds that we're a computer science department so most of our users ought to be able to figure things out on their own. If they can't, we have some people who are there to help them.)

We also have a very small number of people editing the HTML, and we're in a single group. If there were different people editing different sections of our support site, it would be easy for the HTML and style of different pages to drift apart, and we might have to try to write some sort of style guide, and that way lies at least problems. (And we're all familiar with HTML already so there's no learning curve.)

If we were in the same wiki(text) trap today, one obvious question is if we would or should opt for a static site generator instead of raw HTML. If the (wiki) content was already in Markdown, perhaps the answer would be yes, but otherwise I don't think we'd have been very attracted by the idea of scraping the HTML, turning it into Markdown somehow, and setting up a new wiki-like thing when we'd already been burned by one. Writing in Markdown with a static site generator is obvious but I'm not sure it would get us any better results than we've wound up with. And it definitely would make changing the site more complicated.

(As for the appeal of Markdown versus HTML, I stand by my old entry on the varying appeal of wikitext and other simple markup. As a pragmatic matter, HTML is right for us until the point where we're all commonly using Markdown for other things.)

SupportSiteHTMLExperience written at 22:41:04; Add Comment

2022-04-14

Building Firefox from source and Rust versions

Over on Twitter, I said something related to Ubuntu 22.04 switching how they distribute Firefox that may sound surprising:

On the other hand: Firefox gives Ubuntu LTS heartburn, because building the current Firefox from source requires a current Rust (Firefox releases are tied tightly to Rust releases, don't ask). This mostly forces Ubuntu LTS to keep ratcheting Rust versions up despite 'stable'.

As of today, this is true of the past few years of Firefox releases; a particular Firefox release is fairly tightly coupled to the version of Rust that was current at the time it was released. It most likely can't be built with a significantly older version of Rust, such as you might have on a 'long term support' Unix distribution where people didn't want versions of things changing, and similarly it may not build with a newer version.

A while back I wrote a grumpy entry about how Rust 1.x seemed to not be backward compatible in practice because I kept having problems compiling old Firefox versions on new Rust versions. In comments, people educated me on what was really going on (mostly), which is that the Firefox build process deliberately turns on unstable internal Rust toolchain features (using the RUSTC_BOOTSTRAP environment variable) in order to get access to things that are not stable Rust yet and are normally only used when building Rust itself. These unstable features can change from Rust release to Rust release, creating an implicit requirement for building Firefox with the right Rust version (generally the Rust version that was current when any given Firefox release was made).

(This isn't the only thing that can go wrong when trying to build an older Firefox with a newer Rust toolchain, but it's probably the big one.)

In addition, sometimes these or other unstable Rust features are promoted to stable features. When this happens, Firefox generally increases its official minimum Rust version to the relevant Rust release, sometimes a very recent one (for example). Right now, for example, Firefox Nightly (which I think will become version 100 or 101) officially requires Rust 1.59, which was released in late February, less than two months ago.

The reason that Ubuntu and other Unixes care about the latest version of Firefox at all is that Firefox doesn't support old releases for long enough (Ubuntu LTS releases have a five year lifetime) and Canonical is too sane to try to take on the burden of finding and fixing Firefox security problems on their own. Even Firefox ESR releases are only supported for a year or so, and regular Firefox releases stop being supported more or less the moment the next one comes out. If you package Firefox at all, you're on a treadmill of keeping up with Firefox releases. If you want to build from source instead of just putting a wrapper around the Mozilla binaries, you need to keep up on Rust too (at least for the version of Rust you use to build Firefox, which could always be a special thing).

FirefoxAndRustVersions written at 23:25:26; Add Comment

2022-03-05

Dynamic web pages can be viewed as a form of compression (sometimes)

If you're in a situation where static files are an alternative to dynamically generating web pages, one of the ways to view dynamic web pages is through the lens of compression, where actually generating any particular dynamic web page is 'decompressing' it from the raw data (and code) involved. This can provide a useful perspective for thinking about when dynamic web pages are attractive and when they're not.

(Not all uses of dynamic web pages are possible as static files; there may not be a finite number of possible static files to be created, for example.)

In some cases a single web page can be smaller as a dynamic web page's data and code than as a static file. For an extreme example, consider a plain text web page that is a million lines of 'yes' (which is to say, a million repeats of 'yes\n'). The code and the trivial data needed to dynamically generate this content on demand is clearly smaller than the static file version, and it might even use fewer CPU cycles and other resources to generate it than to read the file (once you count all of the CPU cycles and resources the kernel will use, and so on).

(In general, when considering compression and decompression efficiency you need to include the code as well as the 'compressed' data. You can do the same thing for static files by including all of the code involved to read them, but in many cases this code is free because it's already needed for other things or you couldn't remove it from your web serving environment even if you tried.)

More often a single dynamic web page may not be a space savings, especially once you count the code involved as well, but there is an overall space saving across the entire site due to the reuse of templates and page elements, and perhaps the true dynamic generation of some information. Any individual web page or the overall site as a whole might be more CPU and RAM efficient if served as a static files instead (even including the kernel overhead), but this is less important than the compression achieved by dynamic rendering.

(At this point we may also want to consider the resources required to generate all of the static file version as compared to the resources required to serve only the pages that people ask for. But on the modern web everything gets requested sooner or later. Still, there may well be a saving in the resource of human time.)

One thing this implies is that the more incompressible a URL or a website area is, the less useful dynamic generation may be. If you're dynamically serving essentially incompressible blobs, like images, the only thing that you can really change is how the blobs are stored. On the other hand, images can be 'compressible' in some sense in that, for example, you can store only a high resolution version and then generate smaller sized ones on demand. This will cost you CPU and other resources during the generation but may save you a lot of static data space.

Actually doing this 'decompression' efficiently in general has some issues in the web context. For example, static files trivially support efficiently starting from any byte offset, for resumed requests and partial requests. This can be efficiently supported in some dynamic generation code with extra work (the 'yes' example could support this), but is much harder in others (leading to the irony of conditional GET for dynamic websites, where the only resource it saves is network bandwidth). This suggests another situation where static files may be better in practice than dynamic generation even if they take more space.

(This collection of thoughts was sparked by writing yesterday's entry on service static files being driven by efficiency.)

DynamicPagesAreCompression written at 22:20:54; Add Comment

2022-03-04

A pragmatic driver of support for serving static files on the web is efficiency

Here is a question that you could ask in a certain sort of mood: why is serving static files from a web server so well supported? It's not universal (there are some web server environments with no support for it), partly because it's not as simple as it looks and you also need something approximating a filesystem, but it's very common and has been for a long time.

I think that one of the reasons is that lots of people wind up with a certain amount of basically fixed files (well, URLs or blobs of data) that they would like to serve efficiently (with as little server resource usage as possible); these might be images, or CSS, or essentially fixed Javascript, even if the HTML is generated dynamically. One answer to this is a CDN, but not everyone wants to do that or can. If you have non-varying data available in some way, it's possible to serve it very efficiently with low resources with a number of increasingly specialized approaches. Often the most convenient and efficient way to make this data available to your web server is to put it in a filesystem.

Since plenty of people wind up wanting to serve some static files even in an otherwise dynamic application, support for this is widely available and robust in general purpose web servers. Once this support exists for the special case of high efficiency serving of high demand static data (in files), it can usually be reused by people who want to serve static files in general and who don't care anywhere near as much about the efficiency. Everyone gets to benefit.

(A web server usually has to do a little bit of extra work to be a good general purpose static file server, but not too much. This little bit of extra work is one reason why some servers have limited static serving support; for example, they might not handle directories, since you don't need that to serve images, CSS, and so on. As a side note, Apache is unusually capable at serving static directory trees, partly for historical reasons.)

There are a variety of technical reasons why serving static blobs (often in files) has generally been more efficient than dynamically producing the same data on the fly (to the extent that you can do that; some data, like images, is essentially fixed and incompressible, with no meaningful dynamic generation of it possible). Part of it is that many operating systems are highly tuned for efficiently reading data from the filesystem and runtime environments have mostly not been finely tuned to make data generation code as efficient, low overhead, and scalable (especially if you're serving the same thing over and over again).

StaticFilesAndEfficiency written at 23:56:59; Add Comment

2022-02-28

Firefox (Nightly) and the case of the fading scrollbars on Unix

For reasons beyond the scope of this entry, I run a self-compiled Firefox that's built from the latest Firefox development sources (as of when I build or rebuild it). Recently, I've noticed that my scrollbars have been flickering. At first I didn't clearly see what was going on; later, I realized that the 'flickering' was that the scroll bar would disappear after a while if I didn't move the mouse, then reappear the moment I nudged it even slightly or used the scroll wheel.

I find this sort of behavior quite irritating. I don't like things flickering in and out of visibility any more than I like blinking cursors, because my eye is reflexively distracted by change. It's slightly more tolerable now that I know what's actually causing it and it no longer feels random, but I still want it to go away. Today, having identified it, I managed to track down what's going on.

In recent versions of Firefox Nightly, there's a new exposed Preferences setting in the "General" section for "Always show scrollbars". This corresponds to the new about:config setting 'widget.gtk.overlay-scrollbars.enabled'. Based on the name, this is specific to the GTK widget toolkit, which I believe Firefox only uses on Unix (although I think both for X and Wayland). This preference defaults to 'true' in Nightly, which means to not always show scrollbars. In addition, when scrollbars are shown they're narrower than the current Firefox version.

Turning this off mostly works, but not entirely. On at least the "Preferences" pages, scrollbars will still fade out once and then immediately reappear. If you move the mouse or use the scrollwheel, this resets things and the next time you pause, the scrollbars will once again fade out then pop back in. Since this behavior makes me a bit nervous, I'm not going to disable these fading scrollbars right away, although maybe I should since it seems to create problems interacting with web pages in some circumstances (cf bug #1756831).

(This change appears to have landed recently, and see also bug #1147847.)

In the Firefox way, I wouldn't be surprised if these fading "overlay" scrollbars eventually became mandatory instead of something you could still turn off. Some UI choices linger in Firefox for a long time and still work the old way, but in modern Firefox, a lot of the time an optional but 'defaults to on' new UI element eventually becomes your only choice.

(Sometimes this is because the old way is no longer supported; sometimes this is because the old way gets broken and no one gets around to fixing it.)

FirefoxGtkFadingScrollbars written at 22:20:53; Add Comment

2022-02-25

The varying sizes of images on the web today, and remembering that

Once upon a time on the web, people used relatively small image sizes because it was rude to do otherwise, especially when they were being used for things like icons. If somewhere had visually small icons, they almost always were small in actual image dimensions, and all the same size because the website made you do it that way. Over time that has shifted. People started using larger and larger images, even for things that were destined for little icons, and websites started accepting these images by clamping the image size in their HTML. This mostly works fine (although people on cellular data may be a bit unhappy with you), but it does open the door to accidental mistakes that produce awkward outcomes. Specifically, if you ever reuse the image in a context where it doesn't have its size clamped by some HTML (or CSS), people will see a surprise giant image. Sometimes it can be hard to work out why this is happening.

All of that sounds abstract, so let me give you a concrete example. I read Planet Fedora in my feed reader of choice (Liferea, although an old version), and for a long time articles have had mysterious giant images at or near the top. Recently I worked out what was going on.

Like many planet blog aggregators, Planet Fedora has little icons beside each aggregated entry as an indicator of who wrote it. The <img> URLs for the icon are their original sources and don't have sizes specified, but on the website, the CSS clamps the maximum size of the image to 150 px. When the decision was made to include '<img>' elements for these icons in the syndication feed's HTML for each entry, it was overlooked that these icon images could be any size, including large sizes, and so the <img> elements in the feed have no size limits. To a feed reader, these <img> icons look like regular pictures (which normally should be allowed to use the full size of the entry display area, whatever that is), and so you get periodic surprise giant pictures at the start of entries.

I tend to feel that there's no particular person who's at fault for this. On today's web, it's perfectly reasonable to use large images for your icon even if you know it's going to be displayed in small size. Once you have people using large images for icons that are somewhat invisibly constrained by CSS, you've created a perfect situation for someone reusing the icon images in another context to overlook the size issue.

ImageSizesRemembering written at 22:39:21; Add Comment

(Previous 10 or go back to February 2022 at 2022/02/16)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.