Why I don't think browsers will ever standardize how 'Reader Mode' works

September 9, 2018

I recently read Daniel Aleksandersen's four part series on 'reading mode' in (most) browsers (parts 1, 2, 3, and 4, discovered via my referer logs). In the final summary part, "Web Reading Mode: A bad reading experience", Aleksandersen suggests that there should be standardization of how browsers parse pages to determine what is the 'main page' contents they will show. I'm not a fan of the current state of affairs (I've written about the limitations of Firefox's Reader mode), but I think that browsers will never standardize how this works, and may never fully document it. This isn't because browser people are evil; it's because locking down how reader mode works through a standard would very likely result in reader mode being subverted by web site publishers.

The ultimate purpose of reader mode is to remove things from the website, and it is most needed on exactly those websites that put the most extraneous things in. However, these websites are not putting those extraneous things into the page randomly or because they are idiots; instead, those things serve the interests of the website in various ways (this is most obvious with inserted advertising). Since websites want these elements to be present in the pages that people see, they have a decent motivation to trick and subvert browser reader modes so that this content is still included (in as original a form as possible), especially if it is easy.

In short, if you provide websites with a mechanism to say 'include this in reader mode', they will use it on things that should not be included in reader mode. Standardizing how reader mode determines what is and isn't main content is one way to provide websites with such a mechanism.

Now, this mechanism already sort of exists, in that you can reverse engineer how the various reader modes determine this and what they include, but at least two things slow down websites here; there's more than one implementation to target and implementations are free to change their approach and invalidate your work to date. As a result, right now, it's generally not worth people's while to do all of this work given the low likely payoff. Standardization would likely reduce the amount of work you need to do substantially, so I'd expect to see quite a few websites throw in the necessary tags.

Browsers standardizing reader mode is somewhat like mail systems standardizing what is considered non-spam, and I think it's about as unlikely for this reason alone (never mind any other ones, such as whether browsers consider this either a priority or a competitive advantage). This is a pity, but unfortunately the modern web is a hostile environment (at least in the large).

Comments on this page:

By Zev Weiss at 2018-10-25 02:21:45:

RFC3514, web edition.

Written on 09 September 2018.
« Cookie management models in Firefox Quantum in practice
The Linux kernel's internals showing through in the specifics of an NFS bug »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Sep 9 21:59:22 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.