Wandering Thoughts archives


Feed readers and their interpretation of the Atom 'title' element

My entry yesterday had the title of The HTML <pre> element doesn't do very much, which as you'll notice has a HTML element named in plain text in the title. In the wake of posting the entry, I had a couple of people tell me that their feed reader didn't render the title of my entry correctly, generally silently omitting the '<pre>' (there was a comment on the entry and a report on Twitter). Ironically, this is also what happened in Liferea, my usual feed reader, although that is a known Liferea issue. However, other feed readers display it correctly, such as The Old Reader (on their website) and Newsblur (in the iOS client).

(I read my feed in a surprising variety of syndication feed readers, for various reasons.)

As far as I can tell, my Atom feed is correct. The raw text of my Atom feed for the Atom <title> element is:

<title type="html">The HTML &amp;lt;pre&gt; element doesn&#39;t do very much</title>

The Atom RFC describes the "type" attribute and its various interpretations in section 3.1.1, which helpfully even has an explicit example of '<title type="html">' in it. For 'type="html"', it says:

If the value of "type" is "html", the content of the Text construct MUST NOT contain child elements and SHOULD be suitable for handling as HTML. Any markup within MUST be escaped; for example, "<br>" as "&lt;br>".

The plain text '<pre>' in my title is encoded as '&amp;lt;pre&gt;'. Decoded from Atom-encoded text to HTML, this gives us '&lt;pre>', which is not HTML markup but an encoded plain-text '<pre>' with the starting '<' escaped (as it is rendered repeatedly in the raw HTML of this entry and yesterday's).

(My Atom syndication feed generation encodes '>' to '&gt;' in an excess of caution; as we see from the RFC, it is not strictly required.)

Despite that, many syndication feed readers appear to be doing something wrong. I was going to say that I could imagine several options, but after thinking about it more, I can't really. I know that Liferea's issue apparently at least starts with decoding the 'type="html"' title attribute twice instead of once, but I'm not sure if it then decides to try to strip markup from the result (which would strip out the '<pre>' that the excess decoding has materialized) or if it passes the result to something that renders HTML and so silently swallows the un-closed <pre>. I can imagine a syndication feed reader that correctly decodes the <title> once, but then passes it to a display widget that expects encoded HTML instead of straight HTML. An alternate is that the display widget only accepts plain text and the feed reader made a mistake in the process of trying to transform HTML to plain text where it decodes entities before removing HTML tags instead of the other way around.

(Decoding things more times than you should can be a hard mistake to spot. Often the extra decoding has no effect on most text.)

Since some syndication feed readers get it right and some get it wrong, I'm not sure there's anything I can do to fix this in my feed. I've used an awkward workaround in the title of this entry so that it will be clear even in feed readers, but otherwise I'm probably going to keep on using HTML element names and other awkward things in my titles every so often.

(My titles even contain markup from time to time, which is valid in Atom feeds but which gives various syndication feed readers some degree of heartburn. Usually the markup is setting things in 'monospace', eg here, although every once in a while it includes links.)

web/AtomTitlesAndFeedReaders written at 23:59:58; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.