Wandering Thoughts archives

2018-09-09

Why I don't think browsers will ever standardize how 'Reader Mode' works

I recently read Daniel Aleksandersen's four part series on 'reading mode' in (most) browsers (parts 1, 2, 3, and 4, discovered via my referer logs). In the final summary part, "Web Reading Mode: A bad reading experience", Aleksandersen suggests that there should be standardization of how browsers parse pages to determine what is the 'main page' contents they will show. I'm not a fan of the current state of affairs (I've written about the limitations of Firefox's Reader mode), but I think that browsers will never standardize how this works, and may never fully document it. This isn't because browser people are evil; it's because locking down how reader mode works through a standard would very likely result in reader mode being subverted by web site publishers.

The ultimate purpose of reader mode is to remove things from the website, and it is most needed on exactly those websites that put the most extraneous things in. However, these websites are not putting those extraneous things into the page randomly or because they are idiots; instead, those things serve the interests of the website in various ways (this is most obvious with inserted advertising). Since websites want these elements to be present in the pages that people see, they have a decent motivation to trick and subvert browser reader modes so that this content is still included (in as original a form as possible), especially if it is easy.

In short, if you provide websites with a mechanism to say 'include this in reader mode', they will use it on things that should not be included in reader mode. Standardizing how reader mode determines what is and isn't main content is one way to provide websites with such a mechanism.

Now, this mechanism already sort of exists, in that you can reverse engineer how the various reader modes determine this and what they include, but at least two things slow down websites here; there's more than one implementation to target and implementations are free to change their approach and invalidate your work to date. As a result, right now, it's generally not worth people's while to do all of this work given the low likely payoff. Standardization would likely reduce the amount of work you need to do substantially, so I'd expect to see quite a few websites throw in the necessary tags.

Browsers standardizing reader mode is somewhat like mail systems standardizing what is considered non-spam, and I think it's about as unlikely for this reason alone (never mind any other ones, such as whether browsers consider this either a priority or a competitive advantage). This is a pity, but unfortunately the modern web is a hostile environment (at least in the large).

web/ReaderModeNoStandards written at 21:59:22; Add Comment

Cookie management models in Firefox Quantum in practice

I was recently reading The WebExtocalypse (via Planet Debian) and ran across the following bit about Firefox Quantum replacements for old non-WebExt extensions:

Some packages are no longer useful upstream but alternatives are available:
[...]

My immediate reflexive reaction was 'these two things are not alike'. I like and use both Cookie AutoDelete and uMatrix, but they have different ways of handling cookies that give you somewhat different results, and neither of them is perfect.

At a hand-waving level, we can break down what happens with cookies into three separate things: whether a website is allowed to set cookies that Firefox will store, whether existing cookies are provided to the website in requests, and whether cookies that are set by the website are later deleted (whether the website likes it or not), and when. The two extensions choose different options here, with different effects and complications.

In Cookie AutoDelete, things are simple; it does nothing to change Firefox away from accepting cookies from websites or returning them to websites. All it does is delete website cookies some time after you close the website's tab (unless you've told it otherwise for specific websites). In effect it makes all cookies into rapidly expiring session cookies, but while they exist the website can track you (during your session on it, for example).

(Based on some testing I just did, it appears that CAD expires third party cookies even if you still have a tab open on the first party website that led to their creation. This is sensible but possibly something to watch out for.)

In uMatrix, cookies are always accepted from websites but not provided back to them unless you permit this. uMatrix normally leaves accepted cookies in the browser, but you can turn on a non-default setting ('delete blocked cookies') that deletes some or all cookies from blocked sites. uMatrix's documentation isn't clear about what cookies are deleted here; it could be only cookies that the site set in this request, or it could be all cookies from the site. Thus, uMatrix hides cookies from websites but allows your browser to accumulate them, and these cookies may potentially be returned to the website later under some circumstances (I believe one way is through Javascript).

(I'm also not sure how uMatrix's optional deletion interacts with Firefox's first-party isolation, if you've found that and turned it on. Cookie AutoDelete is currently explicitly incompatible with FPI.)

It's my belief that deleting blocked cookies in uMatrix interacts badly with fine grained choices of first party versus third party cookie permissions. To use a concrete example, I want to carry a Google cookie to control my Google search settings, but I don't want to allow Google to see cookies when it's a third party site embedded into people's pages and so on (so it has a harder time of tracking me around the web). If I tell uMatrix to delete blocked cookies, I suspect that I would be losing my Google search cookies any time I visited a page where Google was embedded as a third-party site.

(This is a version of how global Javascript permissions aren't a great fit with the modern web.)

Neither of these extensions actually prevents websites from setting cookies in the first place. I'm not sure that's something that a web extension can even do in Firefox; the WebExtensions API may be too limited, either in theory or in practice. I think that an ideal extension would offer uMatrix-like fine grained control over what websites can even set cookies (as well as be given them back), while allowing existing cookies to stay by default; this would mitigate even mild exposures and keep things neater. Even then websites would probably find some way to sneak cookies back in, so you'd want to clean them out periodically.

(I would be happy with a uMatrix option for 'do not accept blocked cookies' (provided that it had no effect on existing cookies); I'd turn it on, but other people might leave it off. I'd probably still keep using Cookie AutoDelete as well, though, just in case.)

Sidebar: Medium demonstrates the problem with uMatrix's approach

Medium has (or had) Javascript that will nag you to sign up with them if you visit more than once or twice and they detect this, and the detection seems to be based on cookies. This is a clear case where uMatrix's 'allow them to be set but stop them from being sent to the website' approach breaks down, because the damage is done when Javascript reads out the cookie and uMatrix isn't blocking that (and probably can't).

If you allow Medium to run Javascript (and sometimes you have to in order to get readable articles), the only solution is either not accepting Medium's cookies or purging them almost immediately.

web/FirefoxQuantumCookieModels written at 01:39:26; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.