In praise of uBlock Origin's new 'element zapper' feature
The purpose of the element zapper is to quickly deal with the removal of nuisance elements on a page without having to create one or more filters.
uBlock Origin has always allowed you to permanently block page elements, and a while back I started using it aggressively to deal with the annoyances of modern websites. This is fine and works nicely, but it takes work. I have to carefully pick out what I want to target, maybe edit the CSS selector uBlock Origin has found, preview what I'm actually going to be blocking, and then I have a new permanent rule cluttering up my filters (and probably slightly growing Firefox's memory usage). This work is worth it for things that I'm going to visit regularly, but some combination of the amount of work required and the fact that I'd be picking up a new permanent rule made me not do it for pages I was basically just visiting once. And usually things weren't all that annoying.
Enter Medium and their obnoxious floating sharing bar at the
bottom of pages.
These things can be blocked on Medium's website itself with a
straightforward rule, but the problem is that tons of people use
Medium with custom domains. For example, this article
that I linked to in a recent entry. These days it seems like
every fourth article I read is on some Medium-based site (I exaggerate,
but), and each of them have the Medium sharing bar, and each of
them needs a new site-specific blocking rule unless I want to
globally block all <divs> with the class
Medium changes the name).
(Globally blocking such a <div> is getting really tempting, though. Medium feels like a plague at this point.)
The element zapper feature deals with this with no fuss or muss. If I wind up reading something on yet another site that's using Medium and has their floating bar, I can zap it away in seconds The same is true of any number of floating annoyances. And if I made a mistake and my zapping isn't doing what I want, it's easy to fix; since these are one-shot rules, I can just reload the page to start over from scratch. This has already started encouraging me to do away with even more things than before, and just like when I started blocking elements, I feel much happier when I'm reading the resulting pages.
(Going all the way to using Firefox's Reader mode is usually too much of a blunt hammer for most sites, and often I don't care quite that much.)
PS: Now that I think about it, I probably should switch all of my
per-site blocks for Medium's floating bar over to a single
##div.js-stickyFooter' block. It's unlikely to cause any collateral
damage and I suspect it would actually be more memory and CPU
(And I should probably check over my personal block rules in general, although I don't have too many of them.)
My situation with Twitter and my Firefox setup (in which I blame pseudo-XHTML)
Although it is now a little bit awkward to do this, let's start with my tweet:
Twitter does this with a <noscript> meta-refresh, for example:
<noscript><meta http-equiv="refresh" content="0; URL=https://mobile.twitter.com/i/nojs_router?path=%2Fthatcks%2Fstatus%2F877738130656313344"></noscript>
Firefox (via NoScript), Twitter
included, my Firefox acts on this
<noscript> block. What is
supposed to happen here is that you wind up on the mobile version
of the tweet, eg, and
then just sit there with things behaving normally. In my development
tree Firefox, the version of
this page that I get also contains another <noscript> meta-refresh:
<noscript><meta content="0; URL=https://mobile.twitter.com/i/nojs_router?path=%2Fthatcks%2Fstatus%2F877738130656313344" http-equiv="refresh" /></noscript>
This is the same URL as the initial meta-refresh, and so Firefox sits there going through this cycle over and over and over again, and in the mean time I see no content at all, not even the mobile version of the tweet.
In other environments, such as Fedora 25's system version of Firefox 54, Lynx, and wget, the mobile version of the tweet is a page without the circular meta-refresh. At first this difference mystified me, but then I paid close attention to the initial HTML I was seeing in the page source. Here is the start of the broken version:
<!DOCTYPE html> <html dir="ltr" lang="en"> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0" /> <noscript>[...]
(I suspect that this is HTML5.)
And here is the start of the working version:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.1//EN" "http://www.openmobilealliance.org/tech/DTD/xhtml-mobile11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> [... much more verbiage ...]
Although this claims to be some form of XHTML in its declarations,
Twitter is serving this with a Content-Type of
makes it plain old HTML soup as far as Firefox is concerned (which
is a famous XHTML issue).
What I don't understand is why Twitter serves HTML5 to me in one
browser and pseudo-XHTML to me in another. As far as I can tell,
the only significant thing that differs here between the system
version of Firefox and my custom-compiled one is the User-Agent
(and in particular both are willing to accept XHTML). I can get
Twitter to serve me HTML5 using
wget, but it happens using
either User-Agent string:
--user-agent 'Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0' https://mobile.twitter.com/thatcks/status/877738130656313344 | less
Sidebar: How I worked around this
Initially I went on a long quest to try to find an extension that would turn this off or some magic trick that would make Firefox ignore it (and I failed). It turns out that what I need is already built into NoScript; the Advanced settings have an option for 'Forbid META redirections inside <NOSCRIPT> elements', which turns off exactly the source of my problems. This applies to all websites, which is a bit broader of a brush than would be ideal, but I'll live with it for now.
(I may find out that this setting breaks other websites that I use, although I hope not.)
The (current) state of Firefox Nightly and old extensions
Back in January in my entry on how ready my Firefox extensions are for Firefox Electrolysis, I said that Firefox's release calendar suggested that Firefox's development version (aka 'Nightly') would stop supporting old non-Electrolysis extensions some time around June or July. It's now mid June and some things have happened, but I'm not sure where Mozilla's timeline is on this. So here is what I know.
At the start of May, Firefox Nightly landed bug 1352204, which is
about disabling a lot of older extensions on Nightly. Mozilla has
an information page about this in their wiki, and various news
outlets noticed and reported on this change shortly after it went
live, which means I'm late to the party here. As the Mozilla page covers, you can fix
this by setting the
true. I've done this
ever since I found the option and everything appears to still work
fine in the current Nightly.
(I had some weird things happen with Youtube that caused me to not update my Firefox build for a month or so because I didn't want to deal with tracking the issue down, but when I started to test more extensively they went away. Problems that vanish on their own can be the best problems.)
This change itself doesn't seem to be how Mozilla intends to turn off old extensions, theoretically in Firefox 57. That seems to be bug 1336576, expanded in a Mozilla wiki entry. Based on the Mozilla wiki entry, it appears that Firefox's development code base (and thus Nightly) will continue to allow you to load old extensions even after Firefox 57 is released provided that you flip a magic preference. Firefox 57 itself will not allow you to do so; the preference will apparently do nothing.
As long as Mozilla has legacy extensions that they care about, I believe that the actual code to load and operate such extensions will be present and working in the Firefox code base; this is the 'signed by Mozilla internally' case in their compatibility table. This implies that even if Mozilla disables the preference in the development version, you can force-override this with a code change if you build your own Firefox (which is what I do). You may not be able to turn Electrolysis on if you have such old legacy extensions, but presumably your addons are more important than Electrolysis (this is certainly the case for me).
All of this makes me much happier about the state of my personal Firefox than I used to be, because it looks like the point where many of my current extensions will fall over is much further away than I thought it was. Far from being this summer, it may be next summer, or evn further away than that, and perhaps by then the release of Firefox 57+ will have caused more of the addons that I care about to be updated.
(However, not all of the omens for updated addons are good. For example, Self-Destructing Cookies now explicitly marks itself as incompatible with Electrolysis because apparently addon can't monitor sites' LocalStorage usage in e10s. This suggests that there are important gaps in what addons can now do, gaps that Mozilla may or may not close over time. At least this particular case is a known issue, though; see bugs 1333050, 1329745, and 1340511 (via the addons page for Cookie Autodelete, which I was recently pointed at by a helpful reader of Wandering Thoughts).)
Another case of someone being too clever in their
Every so often, something prompts me to look at the server logs for
Wandering Thoughts in some detail to see what things are
lurking under the rocks. One area I wind up looking at is what
User-Agents are fetching my syndication feeds; often interesting
things pop out (by which I mean things that make me block people). In a recent case, I happened to
spot the following
Mozilla/5.0 (compatible) AppleWebKit Chrome Safari
That's clearly bogus, in a way that smells of programming by
has heard that mentioning other user-agents in your
string is a good idea, but they don't quite understand the reason
why or the format that people use. So instead of something that
looks valid, they've sprayed in a random assortment of browser
and library names.
As with the first too-clever
User-Agent, my initial reaction was to block
this user agent entirely. It didn't help that it was coming from
random IPs and making no attempt to use conditional
GET. After running this way for a few days and
seeing the fetch attempts continue, I got curious enough to do an
Internet search for this exact string to see if I could turn up
someone who'd identified what particular spider this was.
I didn't find that. Instead, I found the source code for this,
which comes from Flym, an Android feed reader (or maybe this fork of it). So, contrary to how this
User-Agent makes it look, this is actually a legitimate feed
reader (or as legitimate a feed reader as it can be if it doesn't
GET, which is another debate entirely). Once
I found this out, I removed my block of it, so however many people
who are using Flym and spaRSS can now read my feed again.
(Flym is apparently based on Sparse-RSS, but the current version of
that sends a
User-Agent of just
"Mozilla/5.0" (in here),
which looks a lot less shady because it's a lot more generic. Claiming
to be just '
Mozilla/5.0' is the 'I'm not even trying' of
Interestingly, I do appear to have a number of people pulling Wandering
Thoughts feeds with this
User-Agent, but it's so generic that I have
no idea if they're using Sparse-RSS or something else.)
In the past I've filed bugs against open
source projects over this sort of issue, but sadly Flym doesn't appear to accept bug
reports through Github and at the moment I don't feel energetic
enough to even consider something more than that. I admit that
part of it is the lack of conditional
GET; if you don't
put that into your feed reader, I have to assume that you don't
care too much about HTTP issues in general.
(See my views on what your
User-Agent header should include and
why. Flym, spaRSS, and Sparse-RSS all fall
into the 'user agent' case, since they're used by individual users.)
PS: Mobile clients should really, really support conditional
because mobile users often pay for bandwidth (either explicitly or
through monthly bandwidth limits) and conditional
GET on feeds
holds out the potential of significantly reducing it. Especially
for places with big feeds, like Wandering Thoughts. But this
is not my problem.
URLs are terrible permanent identifiers for things
I was recently reading the JSON Feed version 1 specification (via Trivium, among other places). I have a number of opinions on it as a syndication feed format, but that's not the subject of today's entry, because in the middle of the specification I ran into the following bit (which is specifically talking about the elements of feed entries, ie posts):
id(required, string) is unique for that item for that feed over time. [...] Ideally, the
idis the full URL of the resource described by the item, since URLs make great unique identifiers.
When I read this bit, I had an immediate pained reaction. As someone who has been running a blog for more than ten years and has made this exact mistake, let me assure you that URLs make terrible permanent unique identifiers for things. Yes, yes, cool URLs don't change, as the famous writeup says. Unfortunately in the real world, URLs change all of the time. One reason for this that is especially relevant right now is that URLs include the protocol, and right now the web is in the process of a major shift from HTTP to HTTPS. That shift just changed all your URLs.
(I think that over the next ten years the web will wind up being almost entirely HTTPS, even though much of it is not HTTPS today, so quite a lot of people will be going through this URL transition in the future.)
This is not the only case that may force your hand. And beyond more or less forced changes,
you may someday move your blog from one domain to another or change
the real URLs of all of your entries because you changed the blog
system that you use (both of which has happened). In theory you can
create a system to generate syndication feeds that deals with all
of that, by having a 'URL for id' field of some sort (perhaps
automatically derived from your configuration of URL redirections),
but if you're going to wind up detaching what you put in the
field from the actual canonical URL of the entry, why not make it
arbitrary in the first place? It will save you a bunch of pain to
do this from the start.
(Please trust me on this one, seeing as this general issue has caused me pain. As my example illustrates, using any part of the URL as part of your 'permanent identifier' is going to cause you heartburn sooner or later.)
There are excellent reasons why the Atom syndication format both explicitly allows for and more importantly encourages various forms of permanent identifiers for feed entries that are not URLs. For example, you can use UUIDs (as 'urn:uuid:<uuid>') or your own arbitrary but unique identifier in your own namespace (as tag: URNs). The Atom format does this because the people who created it had already run into various problems with the widespread use of URLs as theoretically permanent entry identifiers in RSS feeds.
We use jQuery and I've stopped feeling ashamed about it
I'll start with my tweets:
A confession: I have a web site that uses jQuery. An old version of jQuery, at that. It probably always will, because it works this way.
I could spend a bunch of time (as a non-JS-expert) to make my site use current native browser APIs instead. But there's no payoff for us.
I've completely given up feeling guilty about still using jQuery and not updating our jQuery versions. Our site is a tool, not a sculpture.
(Typical writing about this is eg here, here, here, or this blog entry by Laurie Voss that mentions it in passing (via). Running across Laurie Voss's blog entry is what pushed me into my tweets.)
Beyond the sin of still using jQuery at all in our web app, we're also still using an old version of jQuery (specifically jQuery 1.9.0, released in early 2013 and so now more than four years old). This whole issue has been nagging at me for a while and today I reached the point where I blurted out my tweets, which perhaps isn't the conclusion you might expect.
(Perhaps I should drop in the latest 1.x jQuery and test lightly to see if everything works, just in case there's some bug fix that matters to us that we're missing. But even that is hard to sell, partly because it's still a hassle.)
I'm not sure what I feel about this web spider's User-Agent value
Every so often I do the unwise thing of turning over rocks in the web logs for this blog. Today, one of the things that I found under there was a web spider with the claimed User-Agent of:
BuckyOHare/1.4 (Googlebot/2.1; +https://hypefactors.com/webcrawler)
The requests all came from AWS IP address space, so I have no idea if this actually belongs to the people that it claims to. As is typical for these spiders, it got my attention primarily by attempting to access URLs that no crawler should.
The bit that raised my eyebrows a lot is the mention of Googlebot. On the one hand, there is a long tradition of browsers including the name of other browsers in their User-Agents in order to persuade web sites to do the right thing and serve them the right content. On the other hand, the biggest reason that I can think of to claim to be Googlebot is so that web sites that give Googlebot special allowances for crawling things will extend those allowances to you, and that's a rather different kind of fakery.
(Ironically this backfired for these people because I already had Googlebot blocked off from almost all of the URLs that they tried to access. It does raise my eyebrows again that almost all of the pages they tried to access were Atom feeds or 'write a comment' pages. For now I've decided that I don't trust these people enough to allow them any access to Wandering Thoughts, so they're now totally blocked.)
I wouldn't be surprised if other web spider operators have also experimented with this clever idea already. If not, I rather suspect that more people will in the future. Given that there are websites that are willing (or reluctantly forced) to allow Google(bot) access but would rather like to block everyone else, more than a few of them are probably using User-Agent matching instead of anything more sophisticated.
(Partly this is because more sophisticated methods are some combination of more work to maintain and more time to check in the web server itself.)
A shift in the proper sizes of images on web pages
In the old days, one of the ways that you could irritate people was to build your web pages using full-sized, full-resolution images and then tell the user's browser to resize them for you. This was the lazy person's way of building your pages (because it didn't require you to find a good image resizing program), and while it generally tested okay on your local network, it made a whole lot of people annoyed as their browsers slowly downloaded your big images only to, in effect, throw most of them away. Good web design was to resize your images to their final size on the server and only serve that size (in a suitable streamlined image format, too, often JPEG instead of a PNG original).
Then quietly things changed. Along came devices with very different screen sizes and resolutions, which created responsive design, and as part of responsive design you suddenly wanted images to scale down to fit into the available screen space. And generally once you're scaling down you also want to be able to go large(r), so that people browsing on relatively large screens don't get postage-stamp sized images that don't look too appealing.
(As I found out, you probably don't want to give the client a small image and have it resize it larger. Resizing images larger rarely works very well.)
Of course this is a privileged position. Not everyone has fast and inexpensive networking; a significant portion of the world has relatively slow and often expensive connectivity and is often using small devices on top of that. When you (or I) chose to serve up big images all the time, we are harming these less network-privileged people in various ways. If we care (and maybe we should), we should try to do better. For example, there are probably ways to use CSS media queries to select appropriate starting point images.
(The 'big image' situation of the past was also a privileged position; it's just that fewer people were inside the circle of network privilege and the people outside it were better placed to be vocal about things, since they were in the West and were people that designers and companies wanted to appeal to. The fundamental difference between then and now is how many people in our audience we assume have fast networking and good devices.)
PS: If you routinely use 'view page in no style', you rapidly get an appreciation for how many people serve quite large images. It's a lot, even (or especially) for blogs and other less commercial websites. My own shift makes me rather late to this particular party.
(Years ago I sort of had the same experience on Flickr, but I believe that more or less went away later for a while. It returned a couple of years ago, and I quietly switched from using Firefox on Flickr to mostly using Chrome.)
This result sort of surprises and depresses me (partly because using Chrome has its pains). My understanding is that in theory Firefox and Chrome are usually relatively neck and neck as far as performance goes, with Firefox at least competitive, and that especially on common sites Firefox should not be laggy. There are a number of things that could be causing this for me and not for other people, especially general users. For a start I'm on Linux and using Fedora's build of Firefox instead of the Mozilla build, while I think most performance comparisons are made on Windows or MacOS and use the official Mozilla builds.
(I'm also using a relatively odd Linux environment with relatively modest OpenGL and compositing support, which might hurt Firefox more than Chrome.)
(And I'm using some extensions in Chrome's incognito mode that I would expect to be sort of heavyweight, like uBlock Origin and a mouse gestures extension.)
PS: I care about this partly because I dislike some things Google does with Chrome and partly because I care about Firefox being competitive and good in general. The overall web ecology benefits when we have a real choice in browsers, and part of having a real choice is good performance.
(I also think that Mozilla fundamentally cares more about Linux for Firefox than Google does for Chrome. As a non-Windows, non-Mac user, I remember the days when I was a second class citizen on the web and I would rather like to not go too far back to them.)
On today's web, a local Certificate Authority is fairly dangerous
In a comment on my entry on generating self-signed TLS certificates today, James suggested:
My go-to tool is OpenVPN's EasyRSA. Admittedly that creates a CA which you can then sign certificates with, but for your internal hosts it would mean you could install said CA into your browser and then trust them all.
Superficially, this is certainly an appealing idea. If you have a fleet of IPMIs or other internal websites that need TLS certificates and that have names where you can't get public certificates, you can avoid everyone having to trust them one by one. Just set up a local CA, sign all the internal website certificates with them, add the local CA certificate to your browser, and you're done.
Unfortunately if you do this, you have just loaded a fairly large security-defeating gun and pointed it straight at your face. It's not just that your local CA can be attacked to sign certificate for any host, not just your internal ones; more importantly, certificates signed by a manually added CA specifically bypass all of the modern TLS protections built into browsers. This isn't just things like HTTP Public Key Pinning headers that your browser may have memorized, it's also even critically important pinned keys hard-coded into browsers themselves. A certificate signed by a manually added CA bypasses all of those checks.
(For all of this we may blame HTTPS interception middleware. Browser vendors have extremely reluctantly bowed to the demands of businesses that want to deploy them and have them intercept absolutely everything, partly because businesses basically hold the cards here if they're willing to go far enough.)
As far as I know there's no way in either Firefox or Chrome to constrain a manually added CA to only have its certificates accepted for certain (sub)domains. This means that no matter what you want, your local CA intended for intranet websites has just as much TLS interception ability as the TLS CA for a mandatory HTTPS middleware box. If an attacker can compromise it, they gain complete HTTPS interception capabilities for web browsing, both internal and external. None of the usual precautions and warnings will protect you in the least.
This means that a local CA that you have people's browsers trust is a very big deal, even (or especially) if only the sysadmins are trusting it. If you're going to have one at all, I think that it should involve some sort of hardware security module, even a simple and cheap one. If you are not willing to strongly protect a local CA, at least to the level of buying basic HSM hardware for it, then you should not even think of having one; it's simply far too dangerous in the event of a serious attacker. Even if you buy HSM hardware for it, I think that the balance of risks versus gains are often not going to be in favour of a local CA.
(To be clear, all of this is specific to local CAs that you will have your browsers trust. There are perfectly sensible and not particularly dangerous uses for a local CA outside of this. The general way to know if you're safe is that every operation that is supposed to use the local CA should have to explicitly trust the local CA's root certificate, whether that's through a command-line option or a specific configuration file setting. You should never add a local CA to your general trust roots, whether those are the browser trust roots or the system's generic trust roots.)
(Years ago I sort of wrote about this here, but I didn't take it anywhere near far enough and in particular I didn't think of what an attacker could do with access to your local or organizational CA. Not that overzealous security people aren't a serious risk in and of themselves, and it's not as if middleware HTTPS interception has a good reputation. Rather the contrary.)