Wandering Thoughts archives

2015-05-24

A mod_wsgi problem with serving both HTTP and HTTPS from the same WSGI app

This is kind of a warning story. It may not be true any more (I believe that I ran into this back in 2013, probably with a 3.x version of mod_wsgi), but it's probably representative of the kind of things that you can run into with Python web apps in an environment that mixes HTTP and HTTPS.

Once upon a time I tried converting my personal site from lighttpd plus a CGI based lashup for DWiki to Apache plus mod_wsgi serving DWiki as a WSGI application. At the time I had not yet made the decision to push all (or almost all) of my traffic from HTTP to HTTPS; instead I decided to serve both HTTP and HTTPS along side each other. The WSGI configuration I set up for this was what I felt was pretty straightforward. Outside of any particular virtual host stanza, I defined a single WSGI daamon process for my application and said to put everything in it:

WSGIDaemonProcess cspace2 user=... processes=15 threads=1 maximum-requests=500 ...
WSGIProcessGroup cspace2

Then in each of the HTTP and HTTPS versions of the site I defined appropriate Apache stuff to invoke my application in the already defined WSGI daemon process. This was exactly the same in both sites, because the URLs and everything were the same:

WSGIScriptAlias /space ..../cspace2.wsgi
<Directory ...>
   WSGIApplicationGroup cspace2
   ...

(Yes, this is what is by now old syntax and may have been old even back at the time; today you'd specify the process group and/or the application group in the WSGIScriptAlias directive.)

This all worked and I was happy. Well, I was happy for a while. Then I noticed that sometimes my HTTPS site was serving URLs that had HTTP URLs in links and vice versa. In fact, what was happening is that some of the time the application was being accessed over HTTPS but thought it was using HTTP, and sometimes it was the other way around. I didn't go deep into diagnosis because other factors intervened, but my operating hypothesis was that when a new process was forked off and handled its first request it then latched whichever of HTTP or HTTPS the request had been made through and used that for all of the remaining requests it handled.

(This may have been related to my mistake about how a WSGI app is supposed to find out about HTTP versus HTTPS.)

This taught me a valuable lesson about mixing WSGI daemon processes and so on across different contexts, which is that I probably don't want to do that. It's tempting, because it reduces the number of total WSGI related processes that are rattling around my systems, but even apart from Unix UID issues it's clear that mod_wsgi has a certain amount of mixture across theoretically separate contexts. Even if this is a now-fixed mod_wsgi issue, well, where there's one issue there can be more. As I've found out myself, keeping things carefully separate is hard work and is prone to accidental slipups.

(It's also possible that this is a mod_wsgi configuration mistake on my part, which I can believe; I'm not entirely sure I understand the distinction between 'process group' and 'application group', for example. The possibility of such configuration mistakes is another reason to keep things as separate as possible in the future.)

ModWsgiDualSchemaProblem written at 01:04:59; Add Comment

2015-05-23

The right way for your WSGI app to know if it's using HTTPS

Suppose that you have a WSGI application that's running under Apache, either directly as a CGI-BIN through some lashup or perhaps through an (old) version of mod_wsgi (such as Django on an Ubuntu 12.04 host, which has mod_wsgi version 3.3). Suppose that you want to know if you're being invoked via a HTTPS URL, either for security purposes or for your own internal reasons (for example, you might need separate page caches for HTTP versus HTTPS requests). What is the correct way to do this?

If you're me, for a long time you do the obvious thing; you look at the HTTPS environment variable that your WSGI application inherits from Apache (or the web server of your choice, if you're also running things under an alterate). If it has the value on or sometimes 1, you've got a HTTPS connection; if it doesn't exist or has some other value, you don't.

As I learned recently by reading some mod_wsgi release notes, this is in practice wrong (and probably wrong even in theory). What I should be doing is checking wsgi.url_scheme from the (WSGI) environment to see if it was "https" or "http". Newer versions of mod_wsgi explicitly strip the HTTPS environment variable and anyways, as the WSGI PEP makes clear, including a HTTPS environment variable was always a 'maybe' thing.

(You can argue that mod_wsgi is violating the spirit of the 'should' in the PEP here, but I'm sure it has its reasons for this particular change.)

Not using wsgi.url_scheme was always kind of conveniently lazy; I was pretending that WSGI was still basically a CGI-BIN environment when it's not really. I always should have been preferring wsgi. environment variables where they were available, and wsgi.url_scheme has always been there. But I change habits slowly when nothing smacks me over the nose about them.

(This may have been part of an mod_wsgi issue I ran into at one point, but that's another entry.)

WSGIandCheckingHTTPS written at 00:35:54; Add Comment

By day for May 2015: 23 24; before May; after May.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.