Some things I've learned from transitioning a website to HTTPS
A while back I first added a HTTPS version of my personal site along side the existing HTTP version and then decided that I was going to actively migrate it to HTTPS. The whole thing has been running for a few months now, so it seems about time to write up some things I've learned from it.
The first set of lessons I learned were about everything on my side,
especially my own code. The first layer of problems was code et al with
explicit 'http:
' bits in it; it was amazing and depressing how many
places I was just automatically doing that (you could call this 'HTTP
blindness' if you wanted a trendy term for it). The more subtle problem
areas were things like caches, where a HTTP version of a page might be
different from a HTTPS version yet I was storing them under the same
cache key. I also ran into a situation where I wanted to generate output
for a HTTP URL request but use the 'canonical' HTTPS URLs for links
embedded in the result; this required adding a feature to DWiki.
(I also found a certain amount of other software that didn't cope well.
For example, the Fedora 19 version of mod_wsgi doesn't seem to cope
with a single WSGI application group that's served over both HTTP and
HTTPS; the HTTPS
environment value latches to one value and never
changes.)
Once I had my own code working I got to find out all sorts of depressing things about how other people's code deals with such a transition. In no particular order:
- while search engines did eventually switch over to returning HTTPS
results and to crawling only the HTTPS version of my site, it
took a surprisingly long time (and the switch may not be complete
even now, it's hard to tell).
- Many syndication feed fetchers have not changed to the HTTPS version;
they still request a HTTP URL then get redirected. I will reluctantly
concede that there are sensible reasons for this behavior. It does mean that the HTTP redirects
will probably live on forever.
- There are a certain number of syndication feed fetchers that still
don't deal with HTTPS feeds or at least with redirections to them.
Yes, really, in 2013. Unfortunately two of these are FeedBurner
and the common Planet software, both of which I at least sort of
care about. This led to the 'generate HTTP version but use the
canonical HTTPS links' situation for my software.
- Some web spiders don't follow redirects for
robots.txt
. I decided to not redirect for that URL alone rather than block the spiders outright in the server configuration, partly because the former was a bit easier than the latter.(I already totally ban the spiders in
robots.txt
, which is one reason I wanted them to see it.)
Despite all of this the process has been relatively straightforward and mostly without problems. To the extent that there were problems, I'm more or less glad to know about them (and to fix my code; it was always broken, I just didn't realize it).
|
|