Firefox and my views on the tradeoffs of using DNS over HTTPS
For those who have not heard, Mozilla is (still) planning to have Firefox support and likely default to resolving DNS names through DNS over HTTPS using Cloudflare's DoH server (see eg this news article). The alternate, more scary way of putting this is that Mozilla is planning to send all of your DNS lookups (well, for web browsing) to Cloudflare, instead of your own ISP or your own DNS server. People have mixed feelings about Cloudflare, and beyond that issue and the issue of privacy from Cloudflare itself, there is the fact that Cloudflare is a US company, subject to demands by the US government, and the Cloudflare DoH server you wind up using may not be located in your country and thus not covered by laws and regulations that your ISP's DNS service is possibly subject to (such as Europe's GDPR).
Combining this with that fact that today, your large ISP is one of your threats creates a bunch of unhappy tradeoffs for Mozilla for deploying DNS over HTTPS in Firefox. On the one hand, some or many people are being intruded on today with ISP surveillance and even ISP tampering with DNS results, and these people will have their lives improved by switching to DoH from a trustworthy provider. On the other hand, some people will be exposed to additional risks they did not already have by a switch to DoH with Cloudflare, and even for people who were already being intruded on by their ISP, the risks are different.
Pragmatically, it seems likely that turning on DoH by default in Firefox will improve the situation with DNS snooping for many people. Mozilla has a contract with Cloudflare about DNS privacy, which is more than you have with your ISP (for typical people), and Cloudflare's POPs are widely distributed around the world and so are probably in most people's countries (making them at least partially subject to your laws and regulations). I suspect that Mozilla will be making this argument both internally and externally as the rollout approaches, along with 'you can opt out if you want to'.
However, some number of people are not having their DNS queries snooped today, and even when people are having them intruded on, that intrusion is spread widely across the ISP industry world wide instead of concentrated in one single place (Cloudflare). The currently un-snooped definitely have their situation made worse by having their DNS queries sent to Cloudflare, even if the risk of something bad happening is probably low. As for the distributed definite snooping versus centralized possible snooping argument, I don't have any answer. They're both bad, and we don't and can't know whether or not the latter will happen.
I don't pretend to know what Mozilla should do here. I'm not even sure there is a right answer. None of the choices make me happy, nor does the thought the DoH to Cloudflare by default is probably the pragmatically least generally harmful option, the choice that does the most good for the most people even though it harms some people.
To put it another way, I don't think there's any choice that Mozilla can make here that doesn't harm some people through either action or inaction.
(This sort of elaborates on some tweets of mine.)
Feed readers and their interpretation of the Atom 'title' element
My entry yesterday had the title of The HTML <pre> element doesn't do very much, which as you'll notice has a HTML element named in plain text in the title. In the wake of posting the entry, I had a couple of people tell me that their feed reader didn't render the title of my entry correctly, generally silently omitting the '<pre>' (there was a comment on the entry and a report on Twitter). Ironically, this is also what happened in Liferea, my usual feed reader, although that is a known Liferea issue. However, other feed readers display it correctly, such as The Old Reader (on their website) and Newsblur (in the iOS client).
(I read my feed in a surprising variety of syndication feed readers, for various reasons.)
As far as I can tell, my Atom feed is correct. The raw text of my Atom feed for the Atom <title> element is:
<title type="html">The HTML &lt;pre> element doesn't do very much</title>
If the value of "type" is "html", the content of the Text construct MUST NOT contain child elements and SHOULD be suitable for handling as HTML. Any markup within MUST be escaped; for example, "<br>" as "<br>".
The plain text '<pre>' in my title is encoded as '&lt;pre>'. Decoded from Atom-encoded text to HTML, this gives us '<pre>', which is not HTML markup but an encoded plain-text '<pre>' with the starting '<' escaped (as it is rendered repeatedly in the raw HTML of this entry and yesterday's).
(My Atom syndication feed generation encodes '>' to '>' in an excess of caution; as we see from the RFC, it is not strictly required.)
Despite that, many syndication feed readers appear to be doing something wrong. I was going to say that I could imagine several options, but after thinking about it more, I can't really. I know that Liferea's issue apparently at least starts with decoding the 'type="html"' title attribute twice instead of once, but I'm not sure if it then decides to try to strip markup from the result (which would strip out the '<pre>' that the excess decoding has materialized) or if it passes the result to something that renders HTML and so silently swallows the un-closed <pre>. I can imagine a syndication feed reader that correctly decodes the <title> once, but then passes it to a display widget that expects encoded HTML instead of straight HTML. An alternate is that the display widget only accepts plain text and the feed reader made a mistake in the process of trying to transform HTML to plain text where it decodes entities before removing HTML tags instead of the other way around.
(Decoding things more times than you should can be a hard mistake to spot. Often the extra decoding has no effect on most text.)
Since some syndication feed readers get it right and some get it wrong, I'm not sure there's anything I can do to fix this in my feed. I've used an awkward workaround in the title of this entry so that it will be clear even in feed readers, but otherwise I'm probably going to keep on using HTML element names and other awkward things in my titles every so often.
(My titles even contain markup from time to time, which is valid
in Atom feeds but which gives various syndication feed readers some
degree of heartburn. Usually the markup is setting things in
monospace', eg here, although
every once in a while it includes links.)
The HTML <pre> element doesn't do very much
These days I don't do too much with HTML, so every so often I wind
up in a situation where I have to reach back and reconstruct things
that once were entirely well known to me. Today, I wound up talking
with someone about the
<pre> element and what you could and
couldn't safely put in it, and it took some time to remember most
of the details.
The simple version is that <pre> doesn't escape markup, it only changes formatting, although many simple examples you'll see only use it on plain text so it's not immediately clear. Although it would be nice if <pre> was a general container that you could pour almost arbitrary text into and have it escaped, it's not. If you're writing HTML by hand and you have something to put into a <pre>, you need to escape any markup and HTML entities (much like a <textarea>, although even more so). Alternately, you can actually use this to write <pre> blocks that contain markup, for example links or text emphasis (you might deliberately use bold inside a <pre> to denote generic placeholders that the reader fills in with their specifics).
As with <textarea>, it's easy to overlook this
for straightforward cases and to get away without doing any text
escaping, especially in modern browsers. A lot of the command lines
or code or whatever that we often put into <pre> don't contain
things that can be mistaken for HTML markup or HTML entities, and
modern browsers will often silently re-interpret things as plain
text for you if they aren't validly formatted entities or markup.
I myself have written and altered any number of <pre> blocks over
the past few years without ever thinking about it, and I'm sure
that some of them included '
<' or '
>' and perhaps '
as part of Unix command lines).
(The MDN page on <pre> includes an example with unescaped < and >. If you play around with similar cases, you'll probably find that what is rendered intact and what is considered to be an unrecognized HTML element that is silently swallowed is quite sensitive to details of formatting and what is included within the '< ... >' run of raw text. Browsers clearly have a lot of heuristics here, some of which have been captured in HTML5's description of tag open state. In HTML5, anything other than an ASCII alpha after the '<' makes it a non-element (in any context, not just in a <pre>).)
I don't know how browser interpretation of various oddities in <pre> content is affected by the declared or assumed HTML DOCTYPE or HTML version the browser assumes, but I wouldn't count on all of them behaving the same outside, perhaps, of HTML5 mode (which at least has specific rules for this). Of course if you're producing HTML with tools instead of writing it by hand, the tools should take care of this for you. That's the only reason that Wandering Thoughts has whatever HTML correctness it does; my DWikiText to HTML rendering code takes care of it all for me, <pre> blocks included.
I'll start with my toot, slightly shorn of context:
Every so often I wind up viewing a version of the web that isn't filtered by uBlock Origin and my 'allow basically no JS' settings (in my default browser) and oh ow ow ow.
(But 'allow no JS' is basically the crazy person setting and it's only tolerable because I keep a second browser just for JS-required sites. Which throws away all my cookies & stuff every time it shuts down, because my trust is very low once JS is in the picture)
Having two browsers is reasonably easy (provided that you're willing to use both Chrome and Firefox; these days I instead have two instances of Firefox). Arranging to be able to move URLs and links easily back and forth is probably not for most people in most desktop environments. I'm the kind of person who writes scripts and runs a custom window manager environment, so I can blithely describe this as 'not too much work (for me)'.
(You can always select a link in one browser and do 'Copy link location', then start the other browser and paste it into the URL bar. But this is not a fast and fluid approach.)
Firefox versus Chrome (my 2019 view)
On Twitter, I said:
I continue to believe that Firefox is your best browser option, despite the addons screwup. Mozilla at least tries to be good (and usually is), while Chrome is straight up one tentacle of the giant, privacy invading, advertising company giant vampire squid of Google.
I'm sure there are plenty of good, passionate, well-intended people who work on Chrome, and they care a lot about privacy, user choice, and so on. But existing within the giant vampire squid of Google drastically constrains and distorts what outcomes they can possibly obtain.
Mozilla is absolutely not perfect; they have committed technical screwups, made decisions in the aftermath of that that I feel are wrong, and especially they've made trust-betraying policy decisions, which are the worst problem because they infect everything. But fundamentally, Mozilla is trying to be good and I do believe that it still has a general organizational culture that supports that.
Chrome and the people behind it absolutely can do good, especially when they take advantage of their position as a very popular browser to drive beneficial changes. That Chrome is strongly committed to Certificate Transparency is one big reason that it's moving forward, for example, and I have hopes that their recently announced future cookie changes will be a net positive. But Chrome is a mask that Google wears, and regardless of what Google says, it's not interested in either privacy or user choice that threatens its business models. Every so often, this shows through Chrome development in an obvious way, but I have to assume that for everything we see, there are hundreds of less visible decisions and influences that we don't. And then there's Google's corporate tactics (alternate).
Much as in my choice of phones and tablets, I know which side of this I come down on when the dust settles. And I'm sticking with that side, even if there are some drawbacks and some screwups every so often, and some things that make me unhappy.
(At one point I thought that the potential for greater scrutiny of Google's activities with Chrome might restrain Google sufficiently in practice. I can no longer believe this, partly because of what got me to walk away from Chrome. Unless the PR and legal environment gets much harsher for Google, I don't think this is going to be any real restraint; Google will just assume that it can get away with whatever it wants to do, and mostly it will be right.)
Some weird and dubious syndication feed fetching from SBL-listed IPs
For reasons beyond the scope of this entry (partly 'because I could'), I've recently been checking to see if any of the IPs that visit Wandering Thoughts are on the Spamhaus SBL. As a preemptive note, using the SBL to block web access is not necessarily a good idea, as I've found out in the past; it's specifically focused on email, not any other sorts of abuse. However, perhaps you don't want to accept web traffic from networks that Spamhaus has identified as belonging to spammers, and Spamhaus also has the Don't Route Or Peer list (which is included in the SBL), of outright extremely bad networks.
When I started looking, I wasn't particularly surprised to find a fair number of IPs on Spamhaus CSS; in practice, the CSS seems to include a fair number of compromised IPs and doesn't necessarily expire them rapidly. However, I also found a surprising number of IPs listed in other Spamhaus records, almost always for network blocks; from today (so far), I had IPs from SBL443160 (a /22), SBL287739 (a /20 for a ROKSO-listed spammer), and especially SBL201196, which is a /19 on an extended version of Spamhaus's DROP list. These are all pretty much dedicated spam operations, not things that have been compromised or neglected, and as such I feel that they're worth blacklisting entirely.
Then I looked at what the particular IPs from these SBL listings were doing here on Wandering Thoughts, and something really peculiar started emerging. Almost all of the IPs were just fetching my syndication feed, using something that claims to be "rss2email/3.9 (https://github.com/wking/rss2email)" in its User-Agent. Most of them are making a single fetch request a day (often only one in several days), and on top of that I noticed that they often got a HTTP 304 'Not Modified' reply. Further investigation has shown that this is a real and proper 'Not Modified', based on these requests having an If-None-Match header with the syndication feed's current ETag value (since this is a cryptographic hash, they definitely fetched the feed before). Given that these IPs are each only requesting my feed once every several days (at most), their having the correct ETag value means that the people behind this are fetching my feed from multiple IPs across multiple networks and merging the results.
(I haven't looked deeply at the activity of the much more numerous SBL CSS listed IPs, but in spot checks some IPs appear to be entirely legitimate real browsers from real people, people who just have the misfortune to have or have inherited a CSS-listed IP.)
Before I started looking, I would have expected the activity from these bad network blocks to be comment spam attempts (which is part of what has attracted my attention to SBL-listed networks in the past). Instead I can't see any real traces of that; in fact, in the past ten days only one SBL listed IP has come close to trying to leave a comment here, and that was a CSS listing. Instead they seem to be harvesting my syndication feed, for an unknown purpose, and this harvesting appears to be done by some group that is active across multiple and otherwise unrelated bad network blocks.
(Since SBL listings are about email spammers, the obvious speculation here is that these people are scanning syndication feeds to find email addresses for spam purposes. This is definitely a thing in general, so it's possible.)
As a side note, this rss2email User-Agent is actually pretty common here (and right now it's the latest release of the actual project). Only a small fraction of the IPs using it are on the SBL; most of them are real, legitimate feed fetchers. Although I do have a surprisingly large number of IPs using rss2email that only fetched my syndication feed once today and still got a 304 Not Modified (which, in some cases, definitely means that they fetched it earlier from some other IP). Some of those one time fetchers turn out to have been doing this sporadically for some time. It's possible that these SBL-hosted fetchers are actually using rss2email, and now that I think about it I can see a reason why. If you already have an infrastructure for harvesting email addresses from email messages and want to extend it to syndication feeds, turning syndication feeds into email is one obvious and simple approach.
(I think the real moral here is to not turn over rocks because, as usual, disturbing things can be found there.)
The appeal of using plain HTML pages
Once upon a time our local support site was a wiki, for all of the reasons that people make support sites and other things into wikis. Then using a wiki blew up in our faces. You might reasonably expect that we replaced it with a more modern CMS, or perhaps a static site generator of some sort (using either HTML or Markdown for content and some suitable theme for uniform styling). After all, it's a number of interlinked pages that need a consistent style and consistent navigation, which is theoretically a natural fit for any of those.
In practice, we did none of those; instead, our current support
site is that most basic thing, a
bunch of static
.html files sitting in a filesystem (and a static
When we need to, we edit the files with
vi, and there's no
deployment or rebuild process.
(If we don't want to edit the live version, we make a copy of the
.html file to a scratch name and edit the copy, then move it back
into place when done.)
This isn't a solution that works for everyone. But for us at our modest scale, it's been really very simple to work with. We all already know how to edit files and how to write basic HTML, so there's been nothing to learn or to remember about managing or updating the support site (well, you have to remember where its files are, but that's pretty straightforward). Static HTML files require no maintenance to keep a wiki or a CMS or a generator program going; they just sit there until you change them again. And everything can handle them.
I'm normally someone who's attracted to ideas like writing in a
markup language instead of raw HTML and having some kind of templated,
dynamic system (whether it's a wiki, a CMS, or a themed static site
generator), as you can tell from Wandering Thoughts and
DWiki itself. I still think that they make sense at large scale.
But at small scale, if I was doing a handful of HTML pages today,
it would be quite tempting to skip all of the complexity and just
(I'd use a standard HTML layout and structure for all the
files, with CSS to match.)
(This thought is sort of sparked by a question by Pete Zaitcev over on the Fediverse, and then reflecting on our experiences maintaining our support site since we converted it to HTML. In practice I'm probably more likely to update the site now than I was when it was a wiki.)
Private browsing mode versus a browser set to keep nothing on exit
These days, apparently a steadily increasing variety of websites are refusing to let you visit their site if you're in private browsing or incognito mode. These websites are advertising that their business model is invading your privacy (not that that's news), but what I find interesting is that these sites don't react when I visit them in a Firefox that has a custom history setting of 'clear history when Firefox closes'. As far as I can tell this still purges cookies and other website traces as effectively as private browsing mode does, and it has the side benefit for me that Firefox is willing to remember website logins.
(I discovered this difference between the two modes in the aftermath of moving away from Chrome.)
So, this is where I say that everyone should do this instead of using private browsing mode? No, not at all. To be bluntly honest, my solution is barely usable for me, never mind someone who isn't completely familiar with Firefox profiles and capable of wiring up a complex environment that makes it relatively easy to open a URL in a particular profile. Unfortunately Firefox profiles are not particularly usable, so much so that Firefox had to invent an entire additional concept (container tabs) in order to get a reasonably approachable version.
(Plus, of course, Private Browsing/Incognito is effectively a special purpose profile. It's so successful in large part because browsers have worked hard to make it extremely accessible.)
Firefox stores and tracks cookies (and presumably local storage) on a per-container basis, for obvious reasons, but apparently doesn't have per-container settings for how long they last or when they get purged. Your browsing history is global; history entries are not tagged with what container they're from. Mozilla's Firefox Multi-Account Containers addon looks like it makes containers more flexible and usable, but I don't think it changes how cookies work here, unfortunately; if you keep cookies in general, you keep them for all containers.
I don't think you can see what container a given cookie comes from through Firefox's normal Preferences stuff, but you can with addons like Cookie Quick Manager. Interestingly, it turns out that Cookie AutoDelete can be set to be container aware, with different rules for different containers. Although I haven't tried to do this, I suspect that you could set CAD so that your 'default' container (ie your normal Firefox session) kept cookies but you had another container that always threw them away, and then set Multi-Account Containers so that selected annoying websites always opened in that special 'CAD throws away all cookies' container.
(As covered in the Cookie AutoDelete wiki, CAD can't selectively remove Firefox localstorage for a site in only some containers; it's all or nothing. If you've set up a pseudo-private mode container for some websites, you probably don't care about this. It may even be a feature that any localstorage they snuck onto you in another container gets thrown away.)
A sign of people's fading belief in RSS syndication
Every so often these days, someone asks me if my blog supports RSS (or if I can add RSS support to it). These perfectly well meaning and innocent requests tell me two things, one of them obvious and one of them somewhat less so.
(To be completely clear about this: these people are pointing out a shortfall of my site design and are not to blame in any way. It is my fault that although Wandering Thoughts has a syndication feed, they can't spot it.)
The obvious thing is that Wandering Thoughts' current tiny little label and link at the bottom of some pages, the one that says 'Atom Syndication: Recent Pages', is no longer anywhere near enough to tell people that there is RSS here (much less draw their clear attention to it). Not only is it in a quite small font but it has all sorts of wording problems. Today, probably not very many people know that Atom is a syndication feed format, and even if they do, labelling it 'recent pages' is not very meaningful to someone who is looking for my blog's syndication feed.
(The 'recent pages' label is due to DWiki's existence as a general wiki engine that can layer a blog style chronological view on top of a portion of the URL hierarchy. From DWiki's perspective, all of my entries are wiki pages; they just get presented with some trimmings. I'm going to have to think about how best to fix this, which means that changes may take a while.)
The less obvious thing is that people often no longer believe that even obvious places have RSS feeds, especially well set up ones. You see, DWiki has syndication feed autodiscovery, where if you tell your feed reader the URL of Wandering Thoughts, it will automatically find the actual feed from there. In the days when RSS was pervasive and routine, you didn't look around for an RSS feed link or ask people; you just threw the place's main URL into your feed reader and it all worked, because of course everyone had an RSS feed and feed autodiscovery. One way or another, people evidently don't believe that any more, and I can't blame them; even among places with syndication feeds, an increasing number of them don't have working feed autodiscovery (cf, for one example I recently encountered).
(People could also just not know about feed autodiscovery, but if feed autodiscovery worked reliably, I'm pretty sure that people would know about it as 'that's just how you add a place to your feed reader'.)
In other words, we've reached a point where people's belief in RSS has faded sufficiently that it makes perfect sense to them that a technical blog might not even have an RSS feed. They know what RSS is and they want it, but they don't believe it's automatically going to be there and they sort of assume it's not going to be. Syndication feeds have changed from a routine thing everyone had to a special flavour that you hope for but aren't too surprised when it's not present.
(The existence of syndication feed discovery in general is part of why the in-page labels for DWiki's syndication feeds are so subdued. When I put them together many years ago, I'm pretty sure that I expected feed autodiscovery would be the primary means of using DWiki's feeds and the in-page labels would only be a fallback.)
Staying away from Google Chrome after six months or so
Just short of six months ago, I wrote Walking away from Google Chrome, about how I had decided to stop using Chrome and only use Firefox. Although I didn't mention it in the entry, I implicitly included Chromium in this, which was really easy because I don't even have it installed on my Linux machines.
(A version of Chromium is available in Fedora, but it seems to be slightly outdated and I was always using Chrome in large part because of Google's bundled Flash, which is not in the open source Chromium build.)
Overall, I remain convinced that this is something that's worth doing, however small the impact of it may be. Subsequent developments in the Chrome world have reinforced both the alarming nature of Chrome's dominance and that Chrome's developers are either shockingly naive or deliberately working to cripple popular adblocking and content filtering extensions (see here, here, and here). Using Firefox is a little gesture against the former, however tiny, and provides me with some insulation from the latter, which it seems rather likely that Google will ram through sooner or later.
(It is not complete insulation, since many of the crucial extensions I use are developed for both Firefox and Chrome. One way or another, their development and use on Firefox would probably be affected by any Chrome changes here, if only because their authors might wind up with fewer users and less motivation to work on their addons.)
On a practical level I've mostly not had any problems sticking to this. My habits and reflexes proved more amenable to change than I was afraid of, and I haven't really had any problems with websites that made me want to just hit them with my incognito Chrome hammer. I've deliberately run Chrome a few times to test how some things behaved in it as compared to Firefox, but that's about it for my Chrome usage over the past six months (although I did have to do some initial work to hunt down various scripts that were using Chrome as their browser for various reasons).
My only significant use of Chrome was as my 'accept everything, make things work' browser. As I mentioned in my initial entry, in several ways Firefox works clearly better for this, and I've come to be more and more appreciative of them over the past six months. Cut and paste just works, Firefox requires no song and dance to remember my passwords, and so on. At this point I would find it reasonably annoying to switch much of my use back to Chrome.