Some things I've learned from transitioning a website to HTTPS

October 27, 2013

A while back I first added a HTTPS version of my personal site along side the existing HTTP version and then decided that I was going to actively migrate it to HTTPS. The whole thing has been running for a few months now, so it seems about time to write up some things I've learned from it.

The first set of lessons I learned were about everything on my side, especially my own code. The first layer of problems was code et al with explicit 'http:' bits in it; it was amazing and depressing how many places I was just automatically doing that (you could call this 'HTTP blindness' if you wanted a trendy term for it). The more subtle problem areas were things like caches, where a HTTP version of a page might be different from a HTTPS version yet I was storing them under the same cache key. I also ran into a situation where I wanted to generate output for a HTTP URL request but use the 'canonical' HTTPS URLs for links embedded in the result; this required adding a feature to DWiki.

(I also found a certain amount of other software that didn't cope well. For example, the Fedora 19 version of mod_wsgi doesn't seem to cope with a single WSGI application group that's served over both HTTP and HTTPS; the HTTPS environment value latches to one value and never changes.)

Once I had my own code working I got to find out all sorts of depressing things about how other people's code deals with such a transition. In no particular order:

  • while search engines did eventually switch over to returning HTTPS results and to crawling only the HTTPS version of my site, it took a surprisingly long time (and the switch may not be complete even now, it's hard to tell).

  • Many syndication feed fetchers have not changed to the HTTPS version; they still request a HTTP URL then get redirected. I will reluctantly concede that there are sensible reasons for this behavior. It does mean that the HTTP redirects will probably live on forever.

  • There are a certain number of syndication feed fetchers that still don't deal with HTTPS feeds or at least with redirections to them. Yes, really, in 2013. Unfortunately two of these are FeedBurner and the common Planet software, both of which I at least sort of care about. This led to the 'generate HTTP version but use the canonical HTTPS links' situation for my software.

  • Some web spiders don't follow redirects for robots.txt. I decided to not redirect for that URL alone rather than block the spiders outright in the server configuration, partly because the former was a bit easier than the latter.

    (I already totally ban the spiders in robots.txt, which is one reason I wanted them to see it.)

Despite all of this the process has been relatively straightforward and mostly without problems. To the extent that there were problems, I'm more or less glad to know about them (and to fix my code; it was always broken, I just didn't realize it).


Comments on this page:

Just for the record, so people are aware of it, one way around "HTTP blindness" is to use protocol-relative URLs:

http://www.paulirish.com/2010/the-protocol-relative-url/

Written on 27 October 2013.
« 10G Ethernet and network buffer sizes (at least on Linux)
Old and new addresses and spam »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Oct 27 02:30:49 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.