Clearing cached HTTP redirections or HSTS status in Firefox
As far as I know, all browsers cache (or memorize) HTTP redirections, especially (allegedly) permanent ones, including ones that push you from one site to another. Browsers all also remember the HSTS status for websites, and in fact this is the entire point of HSTS. This is great in theory, but sometimes it goes wrong in practice (as I've noted before). For example, someone believes that they have a properly set up general HTTPS configuration for a bunch of sites, so they wire up an automatic permanent redirection for all of them, and then it turns out that their TLS certificates aren't set up right so they turn the HTTPS redirection off and go back to serving the sites over HTTP. In the mean time, you've visited one site and your Firefox has a death grip on the HTTP to HTTPS redirection, which very definitely doesn't work.
Such a cached but broken HTTP to HTTPS redirection recently happened to me in my main Firefox instance, so I set out on an expedition to find out how to fix it. The usual Internet advice on this unfortunately has the side effect of completely clearing your history of visited URLs for the site, which isn't something that I'm willing to do; my browser history is forever. Fortunately there's a different way to do it, which I found in this superuser.com answer. The steps I'm going to use in the future are:
- get yourself a new, blank tab (although any source of a link to the site will work, such as my home page).
- call up the developer tools Network tab, for example with Ctrl-Shift-E or Tools → Web Developer → Network.
- tick the 'Disable Cache' tickbox.
- enter the URL for the site into the URL bar (or otherwise go to the URL). This should give you an unredirected result, or at least force Firefox to actually go out to the web server and get another redirection, and as a side effect it appears to clear Firefox's memory of the old redirection.
- turn the cache back on by unticking the 'Disable Cache' tickbox.
When I did this, it seemed necessary to refresh or force-refresh the page a few times with the cache disabled before it really took and flushed out the cached HTTP redirect.
(Apparently you can also do this by clearing only the cache through the History menu, see for example this answer. I didn't use this for various reasons, but it does appear to work. This presumably has the side effect of clearing all of your cache, for everything, but this may be tolerable.)
While I was trying to solve this issue I also ran across some pages
on how to delete a memorized Firefox HSTS entry (without deleting
your entire history for the site). The easiest way to do this is
to shut down Firefox, find your profile directory, and then edit
SiteSecurityServiceState.txt that's in it. This is a
text file with a straightforward one site per line format; find the
problem site in question and just delete the entry.
(People with more understanding of the format of each line might be able to de-HSTS a site's entry, but I'm lazy.)
PS: As more and more sites use HSTS, I suspect that Firefox is going to wind up changing how they store HSTS information away from the current text file approach. Hopefully they'll provide some way for an advanced user to force-forget HSTS entries for a host.
PPS: Sadly, I don't expect Firefox to ever provide the APIs that an addon would need to do this, especially for HSTS. Browsers probably really don't want to give addons any way of overriding a site's HSTS settings, and it certainly seems like a dangerous idea to me. The days when we could extend unreserved trust to browser addons are long over; the approach today is to cautiously give them only a very limited amount of power.
A website's design shows its actual priorities
I'll start with my tweet:
Reddit's new design makes it clear that Reddit is now fundamentally a discussion forum, not the aggregator of interesting links and things to read that it started out as and used to be. So it goes.
To explain this, I need to show a screenshot of the new and the old design. Let's start with the new design:
Ignoring all of the white space for now, look at the nice big title. Do you expect that to link to the actual article? Surprise, it doesn't; it takes you to the Reddit discussion of the article. The actual link to the article itself is the blue text in a much smaller font, and you can see here that it's truncated when displayed (despite how much extra space there is to show it in full). This design mostly wants you to click on the big prominent title, not the small, hard to hit blue thing.
(It turns out that you can make this more compact, but that doesn't make the links any bigger or more obvious.)
About all you can say about the prominence of the actual links in this design is that they're in blue, but modern web design is such that I'm not sure people these days assume that blue is a link instead of, well, just blue for some reason.
Compare this to the old design:
Here the prominent titles, which are the things that are both the most obvious and the easiest to click on, actually link to the article. They're also the standard blue colour, which might actually be read as links in this less ornate design. This design also has a sidebar of useful links that go to places outside of Reddit, a sidebar that's not present in the new design no matter how wide you make your browser window.
(The old design also shows visited links in a different colour, as Reddit always has, unlike the new design.)
It's pretty clear to me that the old design intended people to click on the links to articles, taking you away from Reddit; you might then return back to read the Reddit comments. The new design intends for you to click on the links to the Reddit discussions; even on the individual discussion page for a link, the link itself is no more prominent than here. As it is, posts to r/golang and elsewhere are often simply on-Reddit questions or notes; with the new design, I expect that to happen more and more.
(I'm not terribly surprised by the new Reddit design, for the record, because it's very much like the mobile version of their website, which has long made the discussion page prominent and easy to hit and the actual links small and hard to hit. On mobile this is especially frustrating because you don't have a mouse and so hitting small targets is much harder. Perhaps the experience is slightly better in their app, but I won't be installing Reddit's app.)
I don't know if Reddit has been clear about their priorities for their new design; perhaps they have been. But it hardly matters since regardless of what they may have said, their actual goals and priorities show quite clearly in the end result. Design is very revealing that way.
Of course sometimes what it reveals is that you have no idea what your priorities are, so you're just randomly throwing things out and perhaps choosing based on what looks good. But competent design starts with goals and with the designer asking what's important and what's not. Even without that, decisions about things like relative font size almost always involve thinking about what's more and less important, because that's part of both how and why you decide.
(So yes, font size by itself sends a message. Other elements do too, even if they may be chosen through unconsidered superstition. But Reddit is a sufficiently big site and a total redesign of it is a sufficiently big thing that I doubt anything was done without being carefully considered.)
(This is probably obvious to everyone who's deep in this stuff already, but it only occurred to me recently.)
What is the long term future for Extended Validation TLS certificates?
One of the things I wonder about with Extended Validation TLS certificates is what things will look like for them in the long term, say five to ten years. I don't think things will look like today, because as far as I can see EV certificates are in an unstable situation today since in practice they're invisible and so don't provide any real benefits. Commercial Certificate Authorities certainly very much want EV certificates to catch on and become more important, but so far it hasn't happened and it's quite possible that things could go the other way.
So here are some futures that I see for EV certificates, covering a range of possibilities:
- EV certificates become essentially a superstition that lingers on
as 'best practices' among large corporations for whom both the
cost and the bureaucracy are not particularly a factor in their
choices. These organizations are unlikely to go with CAs like
Let's Encrypt anyway, so while they're paying for some TLS
certificates they might as well pay a bit more, submit some more
paperwork, and get something that makes a minor difference in
- EV certificates will become quietly irrelevant and die off. CAs
won't be able to do enough EV certificate business to make it
worth sustaining the business units involved, so they'll quietly
exit the unprofitable business.
- Browsers will become convinced that EV certificates provide no
extra value (and if anything they just confuse users in practice)
and will remove the current UI, making EV certificates effectively
valueless and killing almost all of the business. Browsers hold all
the cards here and at least Mozilla has openly refused to commit to
any particular UI for EV certificates. See, for example,
Ryan Hurt's "Positive Trust Indicators and SSL", which also dumps some rain
on EV certificate problems.
One thing that could tip the browser balance here is scandals in CAs issuing (or not issuing) EV certificates improperly. If EV certificates seem not necessarily routinely worth extra trust, it becomes more likely that browsers will stop giving them any extra trust indicators.
- CAs will persuade browser vendors to make some new browser features
having an EV certificate, on the grounds that such sites are 'extra
trustworthy'. I don't think this is likely to happen, but I'm sure
CAs would like it to since it would add clear extra value to EV
certificates and browsers are making APIs conditional on HTTPS.
(A 'must be HTTPS' API restriction has a good reason for existing, one that doesn't apply to EV certificates specifically, but that's another entry.)
- CAs will persuade some other organization to make some security
standard require or strongly incentivize EV certificates; the
obvious candidate is PCI DSS,
which already has some TLS requirements. This would probably be
easier than getting browsers to require EV certificates for things
and it would also be a much stronger driver of EV certificate
sales. I'm sure the CAs would love this and I suspect that at
least some companies affected by PCI DSS wouldn't care too much
either way. However, some CA moves on EV certificates might harm
(On the other hand, some large ones would probably care a lot because they already have robust TLS certificate handling that would have to be completely upended to deal with the requirements of EV certificates. For instance, Amazon is not using an EV certificate today.)
On the balance the first outcome seems most likely to me at the moment, but I'm sure that CAs are working to try to create something more like the latter two since EV certificates are probably their best hope for making much money in the future.
(I also wonder what the Certificate Authority landscape will look like in five to ten years, but I have fewer useful thoughts on that apart from a hope that Let's Encrypt is not the only general-use CA left. I like Let's Encrypt, but I think that a TLS CA monoculture would be pretty dangerous.)
Extended Validation TLS certificates are basically invisible
Extended Validation TLS certificates are in theory special TLS certificates that are supposed to give users higher assurances about the website that they're visiting; Certificate Authorities certainly charge more for them (and generally do more verification). There are some fundamental problems with this idea, but there's also a very concrete practical problem, namely that EV certificates are effectively invisible.
Today, the only thing the presence or absence of an EV certificate does is that it changes the UI of the browser URL bar a little bit. Quick, how often do you pay any attention to your browser URL bar when you visit a site or follow a link? I pay so little attention to it that I didn't even notice that my setups of Firefox seem to have stopped showing the EV certificate UI entirely (and not because I turned much of it off in my main Firefox).
(It turns out that the magic thing that does this in Firefox is turning off OCSP revocation checks. I generally have OCSP turned off because it's caused problems for me. It's possible that websites using OCSP stapling will still show the EV UI in Firefox, but I don't have any to check. By the way, if you experiment with this you may need a browser restart to get the OCSP preference setting to really apply.)
This matters because if EV certificates are effectively invisible, it's not at all clear why you should bother going through the hassle of getting them and, more importantly for CAs, why you should pay (extra) for them. If almost no one can even notice if your website uses a fancy EV certificate, having a fancy EV certificate is doing you almost no good.
(This is an especially important question for commercial CAs, since Let's Encrypt is busy eating their business in regular 'Domain Validated' TLS certificates. It certainly appears that the future price of almost any basic DV certificate is going to be $0, which doesn't leave much room for the 'commercial' part of running a commercial CA.)
The current invisibility of EV certificates is not exactly a new issue or news, but I feel like doing my part to make it better known. There's a great deal of superstition that runs around the TLS ecosystem, partly because most people rightfully don't pay much attention to the details, and EV certificates being clearly better is part of that.
(EV certificates involve more validation and more work by the CA, at least right now. You can say that this intrinsically makes them better or you can take a pragmatic view that an improvement that's invisible is in practice nonexistent. I have no strong opinion either way, and I'll admit that if you offered me EV certificates with no extra hassle or cost, sure, I'd take them. Would I willingly pay extra for them or give up our current automation? No.)
Most modern web spiders are parasites
Once upon a time, it was possible to believe that most web spiders hitting your site were broadly beneficial to the (open) web and to people in general. Oh, sure, there were always bad ones (including spammers scraping the web for addresses to spam), but you could at least believe that bad or selfish spiders were the exception. It's my view that these days are over and that on the modern web, most spiders crawling your site are parasites.
My criteria for whether something is or isn't a parasite is a bit of a hand wave; to steal some famous words, ultimately I know it when I see it. Broadly and generally, web spiders are parasites when they don't gather information to serve the general public, they don't make the web better by their presence, and they don't even do something that we'd consider generally useful even for a somewhat restricted group of people (such as the people on a chat channel). There are all sorts of parasites, of course; some are actively evil and are trying to do things that will do you harm (such as harvest email addresses to spam), while others are simply selfish.
What's a selfish, parasitic web spider? As an example, there are multiple companies that crawl the web looking for mentions of brands and then sell information about this to the brands and various other interested people. There are 'sentiment analysis' and 'media monitoring' firms that try to crawl your pages and analyze what you say about products; several of them came up recently. There are companies that perhaps maybe might tell you something about the network of links and connections between sites, but you have to register first and perhaps that means you have to pay them money to get anything useful. At one point there were companies trying to gather up web pages so they could sell a plagiarism analysis service to universities and other people. And so on and so forth, at nearly endless length if you actually look at your web server logs and then start investigating.
(The individual parasitic web spiders don't necessarily crawl at high volume, although some of them certainly will try if you let them, but there are a lot of different ones overall. It's somewhat depressing how many of them seem to be involved in the general Internet ad business, if you construe it somewhat broadly.)
(I wrote about some of my own attitudes on this long ago, in The limits of web spider tolerance. Things have not gotten better since then.)
Notice to web spiders: an email address in your user-agent isn't good enough
Every so often I turn over a rock here at Wandering Thoughts by looking at what IP addresses are making a lot of requests.
Most of the time that's Bing's bot, but
every so often something else floats to the top of the list, and
generally it's not something that leaves a favorable impression.
Today's case was clearly a web spider, from IP address 188.8.131.52
(which currently resolves to 'getzonefile.commedia.io') and with
"Mozilla/5.0 (compatible; Go-http-client/1.1; +email@example.com)"
This has caused me to create a new rule for web spiders: just having an email address in your User-Agent is not good enough, and in fact will almost certainly cause me to block that spider on contact.
What the User-Agent of a web spider is supposed to include is a website URL where I can read about what your web spider is and what benefit I get from allowing it to crawl Wandering Thoughts. Including an email address does not provide me with this information, and it doesn't even provide me with a meaningful way of reporting problems or complaining about your web spider, because in today's spam-laden Internet environment the odds that I'm going to send email to some random address is zero (especially to complain about something that it is nominally doing).
Of course, it turns out that this is not the only such User-Agent that I've seen (and blocked). Other ones that have shown up in recent logs are:
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36; +firstname.lastname@example.org"
"Mozilla/5.0 (compatible; um-LN/1.0; mailto: email@example.com)"
The MauiBot crawler is apparently reasonably well complained-about. I haven't found any particular mentions of the 'infegy.com' one from casual searches, but it's probably real (in one sense) given infegy.com's website.
(I also found one feed fetcher that appears to be pulling my feed with a User-Agent that lists an email address and a program name of 'arss/testing', but I've opted not to block it for now or mention the email address. If its author is reading this, you need a URL in there too.)
I'm not sure what web spider authors are thinking when they set their User-Agents up this way, and frankly I don't care (just as I don't care whether these email addresses are genuine and functional, or simply made up and bogus). On the one hand they are admitting that this is a web spider at work, but on the other hand they're fumbling at informing web server operators about their spiders.
PS: I'm aware that blocking web spiders this way is a quixotic and never-ending quest. There are a ton of nasty things out there, even among the ones that more or less advertise themselves. But sometimes I do these things anyway, because once I've turned over a rock I'm not good at looking away.
Firefox turns out to need some degree of 'autoplay' support
When I wrote some notes about Firefox's current media autoplay settings, I said, about the central function in Firefox that decides whether or not media can autoplay:
Since I don't want videos to ever autoplay, [...] I may someday try making the entire function just immediately return false.
I did this experiment and I can now report that the result is a
failure. Completely disabling
ie making it always return
false, results in a Firefox that won't
play video at all. It won't play YouTube videos, which doesn't entirely
surprise me, but it also won't even play directly loaded
The videos load, but clicking on the appropriate 'play' button or
control does nothing and the video never starts going.
The situation with bare
.mp4 files surprises me a little bit,
Firefox is presumably putting up the player controls itself, so it
can know for sure whether or not you clicked on the 'play' button.
Based on some quick spelunking in the Firefox source code, it appears
that calls the media's
.play() method. This is the same fundamental
that surprising that it goes through the same autoplay checks; they
appear to be implemented directly in the
.play() handling code
and apply to anything that calls it, regardless of where from or
This leaves me less surprised about the name and behavior of the
media.autoplay.enabled preference and the other stuff involved
here. Given that Firefox needs there to be some 'autoplay', clearly
it could never be the case that setting
false disabled everything here, because then video (and probably
audio) would never play. That's clearly not what people want.
Microsoft's Bingbot crawler is on a relative rampage here
For some time, people in various places have been reporting that Microsoft Bing's web crawler is hammering them; for example, Discourse has throttled Bingbot (via). It turns out that Wandering Thoughts is no exception, so I thought I'd generate some numbers on what I'm seeing.
Over the past 11 days (including today), Bingbot has made 40998 requests, amounting to 18% of all requests. In that time it's asked for only 14958 different URLs. Obviously many pages have been requested multiple times, including pages with no changes; the most popular unchanging page was requested almost 600 times. Quite a lot of unchanging pages have been requested several times over this interval (which isn't surprising, since most pages here change only very rarely).
Over this time, Bingbot is the single largest source by user-agent (and the second place source is claimed by a bot that is completely banned; after that come some syndication feed fetchers). For scale, Googlebot has only made 2,800 requests over the past 11 days.
Traffic fluctuates from day to day but there is clearly a steady volume. Traffic for the last 11 days is, going backward from today, 5154 requests, then 2394, 2664, 3855, 1540, 2021, 3265, 7575, 2516, 3592, and finally 6432 requests.
As far as bytes transferred go, Bingbot came in at 119.8 Mbytes over those 11 days. Per day volume is 14.9 Mbytes, then 6.9, 7.3, 11.5, 4.6, 5.8, 8.8, 22.9, 6.7, 10.8, and finally 19.4 Mbytes. On the one hand, the total Bingbot volume by bytes is only 1.5% of my total traffic. On the other hand, syndication feed fetches are about 94% of my volume and if you ignore them and look only at the volume from regular web pages, Bingbot jumps up to 26.9% of the total bytes.
I think that all of this crawling is excessive. It's one thing to want current information; it's another thing to be hammering unchanging pages over and over again. Google has worked out how to get current information with far fewer repeat visits to fewer pages (in part by pulling my syndication feed, presumably using it to drive further crawling). The difference between Google and Bing is especially striking considering that far more people seem to come to Wandering Thoughts from Google searches than come from Bing ones.
(Of course, people coming from Bing could be hiding their Referers far more than people coming from Google do, but I'm not sure I consider that very likely.)
I'm not going to ban Bing(bot), but I certainly do wish I had a useful way to answer their requests very, very slowly in order to discourage them from visiting so much and to be smarter about what they do visit.
Some notes on Firefox's current media autoplay settings
I am quite violently against videos ever auto-playing in my regular browser, under basically any circumstances ever (including ones like the videos that Twitter uses for those GIFs that people put in their tweets). I hate it with audio, I hate it without audio, I hate it on the web page I'm currently reading, I hate it on the web page in another tab. I just hate it.
I've traditionally used some number of extensions to control this
behavior on prominent offenders like YouTube (in addition to setting
various things to 'ask before activating'). When I wrote about my
switch to Firefox Quantum, I said that I was
experimenting with just turning off Firefox's
preference and it seemed to work. I had to later downgrade that
assessment and tinker with additional preferences, and I have finally
dug into the code to look at things in more depth. So here are some
The actual Firefox code that implements the autoplay policy checks is pretty short and sort of clear; it's the obvious function in AutoplayPolicy.cpp. As far as I can follow the various bits, it goes like this:
media.autoplay.enabledis true (the default case), the autoplay is immediately allowed. If it's false, we don't reject it immediately; instead, we continue on to make further checks and may still allow autoplay. As a result, the preference is misnamed (likely for historical reasons) and should really be called something like
(There is currently no Firefox preference that totally and unconditionally disables autoplay under all circumstances.)
- Pages with WebRTC camera or microphone permissions are allowed to
autoplay, presumably so that your video conferencing site works
media.autoplay.enabled.user-gestures-neededis false (the default), whether autoplay is allowed or forbidden is then based on the video element is 'blessed' or, I think, if the web page is the currently focused web page that's handling user input (ie, it's not hidden off in a tab or something). As far as blessing goes, the code comments for this say:
True if user has called load(), seek() or element has started playing before. It's only use for checking autoplay policy[.]
media.autoplay.enabled.user-gestures-neededis true, Firefox checks to see if the video will be playing without sound. If it will be silent, the video is allowed to autoplay, even if it is not in the current tab and you haven't activated it in any way.
If the video has audio, it's allowed to autoplay if and only if the web page has been activated by what comments call 'specific user gestures', which I think means you clicking something on the web page or typed at it.
This means the behavior of silent videos is different based on
whether or not you have m.a.e.user-gestures-needed set. If it's the
false, a silent video in another tab will not autoplay.
If you've set it to
true to get more control in general, you
paradoxically get less control of silent videos; they'll always
autoplay, even when they've been opened in another tab that you
haven't switched over to yet.
(My current fix for this is to comment out the audio checking portion
of that code in my own personal Firefox build, so that silent videos
get no special allowances. A slightly better one might be to
immediately deny autoplay if EventStateManager::IsHandlingUserInput()
is false, then check audio volume; if I'm understanding the code
right, this would allow silent video to autoplay only on the current
page. Since I don't want videos to ever autoplay, I'm fine with my
fix and I may someday try making the entire function just immediately
media.autoplay.enabled does cause a certain amount of
glitches for me on YouTube, but so far nothing insurmountable;
people have reported more problems with other sites (here is one
Mozilla people are apparently actively working on this area, per
which has quite a number of useful and informative comments (eg), and
(As far as other video sites go, generally I don't have uMatrix set up to allow them to work in the first place so I just turn to my alternate browser. I only have YouTube set up in my main Firefox because I wind up on it so often and it's relatively easy.)