How (some) syndication feed readers deal with HTTP to HTTPS redirections

June 15, 2016

It's now been a bit over a year since Wandering Thoughts switched from HTTP to HTTPS, of course with a pragmatic permanent redirect from the HTTP version to the HTTPS version. In theory syndication feed readers should notice permanent HTTP redirections and update their feed fetching information to just get the feed from its new location (although there are downsides to doing this too rapidly).

In practice, apparently, not so much. Looking at yesterday's stats from this server, there are 6,700 HTTP requests for Atom feeds from 520 different IP addresses. Right away we can tell that a number of these IPs made a lot of requests, so they're clearly not updating their feed source information. Out of those IPs, 30 of them did not make HTTPS requests for my Atom feeds; in other words, they didn't even follow the redirection, much less update their feed source information. The good news is that these IPs are only responsible for 102 feed fetch attempts, and that a decent number of these come from Googlebot, Google Feedfetcher (yes, still), and another web spider of uncertain providence and intentions. The bad news is that this appears to include some real syndication feed readers, based on their user agents, including Planet Sysadmin (which is using 'Planet/1.0'), Superfeedr, and Feedly.

The IPs that did at least seem to follow the HTTP redirection have a pretty wide variety of user agents. The good or bad news is that this includes a number of popular syndication feed readers. It's good that they're at least following the HTTP redirection, but it's bad that they're both popular and not updating feed source information after over a year of permanent HTTP redirections. Some of these feed readers include CommaFeed, NewsBlur, NetNewsWire, rss2email, SlackBot, newsbeuter, Feedbin, Digg's Feed Fetcher, Gwene, Akregator, and Tiny Tiny RSS (which has given me some heartburn before). Really, I think it's safer to assume that basically no feed readers ever update their feed source information on HTTP redirections.

As it turns out, the list of user agents here comes with a caveat. See the sidebar.

(Since it's been more than a year, I have no way to tell how many feed readers did update their feed source information. Some of the people directly fetching the HTTPS feeds may have updated, but certainly at least some of them are new subscribers I've picked up over the past year.)

At one level, this failure to update the feed source is harmless; the HTTP to HTTPS redirection here can and will continue basically forever without any problems. At another level it worries me, both for Wandering Thoughts and for blogs in general, because very few things on the web are forever and anything that makes it harder to move blogs around is worth concern. Blogs do move, and very few are going to be able to have a trail of HTTP redirections that lives forever.

(Of course the really brave way to move a blog is to just start a new one and announce it on the old one. That way it takes active interest for people to keep reading you; you'll lose the ones who aren't actually reading more (but haven't removed you from their feed reader) and the ones who decide they're not interested enough.)

Sidebar: Some imprecision in these results

Without more work than I'm willing to put in, I can't tell when a HTTPS request from a given IP is made due to following a redirection from a HTTP request. All I can say is that an IP address that made one or more HTTP requests also made some HTTPS requests. I did some spot checks (correlating the times of some requests from specific IPs) and they did look like HTTP redirections being followed, but this is far from complete.

The most likely place where I'd be missing a feed reader that doesn't follow redirections is shared feed reader services (ie, replacements for Google Reader). There it would be easy for one person to have imported the HTTP version of my feed and another person to have added the HTTPS version later, quite likely causing the same IP fetching both HTTPS and HTTP versions of my feed and leading me to conclude that it did follow the redirection.

I have some evidence that there is some amount of this sort of feed duplication, because as far as I can tell I see more HTTPS requests from these IPs than I do HTTP ones. Assuming my shell commands based analysis is correct, I see a number of cases where per-IP request counts are different, in both directions (more HTTPS than HTTP, more HTTP than HTTPS).

(This is where it would be really useful to be able to pull all of these Apache logs into a SQL database in some structured form so I could sophisticated ad-hoc queries, instead of trying to do it with hacky, awkward shell commands that aren't really built for this.)

Comments on this page:

An interesting observation! I just checked my feed reader, Newsblur, and it's still got the http url for your feed in the settings. However, the last five http requests for the feed were all 200s, so either it's been checking the https url, or the redirects don't get reported in the UI.

I bet it's the latter, from a few minutes inspection of the source code. I don't see anything that explicitly handles a 302, so I bet the http library it's using follows them automatically and swallows the intermediate responses.

I disagree with your conclusion. As you write, nothing is forever on the Internet. Even entire domain names change hands every so often. To me it makes sense to me to treat a "permanent" redirect the same way as a temporary redirect, or perhaps give it a very short life. For example "permanent during the current session".

If permanent redirects were indeed in practice more or less permanent, I think they would be treated with a lot more respect than they are today. In blog posts and other examples I see people mostly seem to use them to indicate that "with my current set of publishing tools and ideas for URL planning, this is where you should now look for the content".

In contrast, HSTS is a loaded gun and most site admins seem to acknowledge it as such. HSTS is also not "permanent" since you get to declare up front how long it lasts. To me this indicates that it is a more modern standard.

I think that it would be more interesting and useful to measure how many feed readers respect HSTS than how many respect HTTP 301's.

Lastly, I disagree that it would be a good strategy to rely on permanent redirects when you move things around on your site. The main problem with changing URL's isn't that people's browser histories are getting stale, it's that external links to your site aren't getting updated, so new users who have never been to your website before are now welcomed by a HTTP 404. Few people have long-lived browser histories these days. Just think of how many people use a mobile device as their primary web browser.

Probably almost none of them support HSTS.

By cks at 2016-06-16 11:59:45:

Anton Eliasson: If (and when) you have to change URLs, you have to do something; you aren't so much 'relying' on permanent HTTP redirects as doing the best you can, since having HTTP redirects beats not having them and serving visitors either 404s or generic 'things are not here any more' pages. And while you would obviously like the redirects to there perpetually (to catch everyone who refers to old URLs, even years later), this is not always possible.

(Web search engines seem to update their indexes based on (permanent) HTTP redirects. Explicit URLs in other people's web pages, the social web, etc, almost certainly won't update.)

By Miksa at 2016-06-16 14:52:13:

It would be interesting to temporarily disable the redirect and HTTP feed and see if any of the readers take the hint and switch to HTTPS.

Written on 15 June 2016.
« ZFS on Linux has just fixed a long standing little annoyance
Why you can't remove a device from a ZFS pool to shrink it »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jun 15 22:53:33 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.