Wandering Thoughts archives


Some things on Django's CSRF protection, sessions, and REMOTE_USER

We have a Django application where we've had mysterious CSRF problems in the past, which I've theorized was partly because we use it behind Apache HTTP Basic Authentication. As part of recovering my understand of Django and Apache HTTP Basic Authentication, I've been digging into how Django's CSRF protection works and how it interacts with all of this.

Our starting point is Django's documentation on Cross Site Request Forgery protection. How it works is that Django sets a CSRF cookie and then embeds a hidden form field; on form submission, the two pieces of information must be present and match (everyone does something like this). The CSRF cookie and the form field are both derived from a shared secret to protect from BREACH attacks. The important thing about this shared secret in some situations is, well, let me quote the documentation:

For security reasons, the value of the secret is changed each time a user logs in.

In a Django environment with normal authentication, it's clear when a user logs in; it's when they go through the Django login process, providing Django with a clear moment to establish an authenticated session, rotate secrets, and so on. In an environment where Django is instead relying on external authentication via REMOTE_USER, it's not so clear. The documentation says only that RemoteUserMiddleware will detect the username to authenticate and auto-login that user. The answer to this turns out to involve Django sessions.

When you have sessions enabled in Django, which you normally do, all requests have an associated session (visible in request.session). To simplify, important sessions are identified and tracked by browser cookies, with one created on the fly if necessary (along with a new session). A session may be anonymous or may be for an authenticated user. If the session object for the current request lacks an authenticated user but the request has a REMOTE_USER, RemoteUserMiddleware 'logs in' the indicated user, which will rotate the CSRF secret.

(I'm not sure how Django handles CSRF secrets for anonymous, unauthenticated people. Some versions appear to set the CSRF browser cookie without any session cookie.)

In the default Django configuration, this creates an important split between when you think you've logged in and when Django thinks you've logged in. You think you're logging in any time you have to enter your login and password for HTTP Basic Authentication (which is normally only once, until you quit the browser). However, Django only thinks you're logging in if your session is unauthenticated, and the session cookie Django sets in your browser normally lasts for two weeks (cf). Before then you can quit your browser, start it up again, re-do HTTP Basic Authentication, and not log in from Django's perspective because your session is still fine. Equally, you can keep your browser running and authenticated for more than two weeks, at which point your session cookie will expire and Django will consider you to be logging back in again (with a CSRF secret rotation) even though you were never challenged for a password.

(If you use the relevant setting to tell Django to use a browser session cookie to identify the Django session, you at least more or less synchronize Django's view of you logging in with your view of it.)

The other wrinkle is that if RemoteUserMiddleware sees an authenticated session for a request without REMOTE_USER set, it logs the session out. This is half-documented by implication, but you have to remember (or know) that 'all authenticated requests' means 'all requests with a session that thinks it's authenticated' (and the documentation doesn't actually say that your session gets logged out). This matters if part of your application is generally accessible (for anyone to submit an account request) while part of it is protected by HTTP Basic Authentication (for authorized people to approve those requests for accounts). Suppose that you go to approve an account request, which involves a CSRF protected form, but then pause and in another window go look at the unprotected account request submission page. You're now invisibly logged out, and when you submit the form in your first window, you will be logged back in, which triggers CSRF secret rotation, which invalidates the CSRF secret that underlies both the cookie and the form you just submitted.

To get around this, I think you want to use PersistentRemoteUserMiddleware instead. Or tell people not to do this.

(Much or all of this goes back at least to Django 1.10 and I don't think it changed between 1.10 and 1.11, so all of this still doesn't really explain our CSRF issue in 1.11. But at least I can now probably make problems much less likely in any version of Django.)

PS: One thing that the sessions documentation tells you that I didn't previously know is that in the default configuration where sessions are saved in your database, you need to clear old expired ones out of it periodically with 'django-admin clearsessions'. We hadn't been doing that, and so had entries for ones going back to 2016. The saving grace is that I don't think sessions get written to the database until they really have something in them, like an authenticated user; otherwise we'd have a lot more of them in the database than we do.

DjangoCSRFAndSessions written at 23:41:22; Add Comment


Django and Apache HTTP Basic Authentication (and REMOTE_USER)

We have a Django application, and part of it exists behind Apache HTTP Basic Authentication. For reasons beyond the scope of this entry, I was recently rediscovering some things about how Django interacts with Apache HTTP Basic Authentication, and so I want to write them down for myself before I forget them again.

First, the starting point in the Django documentation for this is not to search for 'HTTP Basic Authentication' or anything like that, but for the howto on authenticating with REMOTE_USER, which is the environment variable that Apache injects when it's already authenticated something. I believe that if you search for 'Django' with 'Basic Authentication' on search engines, you tend to get information about making Django or Django-related things actually perform the server side of HTTP Basic authentication itself. This is fair enough but can be confusing.

Second, you only need to configure Django itself to authenticate with REMOTE_USER if you want to use Django's own authentication for something, such as access and authorization in its admin site. It's perfectly valid (although potentially annoying) to authenticate and limit access to your Django site (or parts of it) in your Apache configuration with Apache's HTTP Basic Authentication but have a separate Django login step to access the Django admin site or even parts of your application (which will then be tracked with cookies and so on). If you want to do this, you don't want to add Django's RemoteUserMiddleware and so on into your Django settings.

(You'll have to manage Apache users and Django users separately, passwords included, and they won't be the same thing. This might wind up being confusing.)

If you do have Django authenticating with REMOTE_USER, you need your Django database superuser to be something you can authenticate with through Apache. If you cleverly set your database superuser to 'admin' but you have no 'admin' in your Basic Auth database, you will be sad. It's possible to get yourself out of this in a couple of ways, but it's better to avoid it in the first place.

(When you do have Django authenticating this way, ever person who uses your Django app through HTTP Basic Authentication will wind up with an entry in the Django 'User' table. Purging old logins that no longer exist is up to you, if you care. For people who you want to be able to use the Django admin site, you need to set them as at least 'Staff' in the Django User table. You can set them as database superusers too.)

It's not necessary to use Django's REMOTE_USER support in order to make use of the authentication information yourself, as long as Apache has HTTP Basic Authentication active. You can retrieve the login name from the $REMOTE_USER environment variable and look it up in your own 'User' table by hand, as we do. You may or may not want to automatically create new entries for new users, the way Django does by default. We don't because new people require some additional configuration on our side.

The corollary to this is that you can use and test your entire site under Apache HTTP Basic Authentication without having Django properly wired up to use REMOTE_USER, without noticing. I believe that this potentially actually matters, because I believe that Django does some things with sessions differently when you have the RemoteUser* things enabled, and this interacts with Django's CSRF protections. Which we've had mysterious problems with (also).

DjangoApacheBasicAuth written at 00:40:55; Add Comment


Pipx and a problem with changing the system Python version

I use pipx on my work laptop, among other places, which I upgraded from Fedora 34 to Fedora 35 today. Afterward, my single pipx installed program didn't work, which was basically what I expected due to the familiar pip issue with Python versions; Fedora 34 has Python 3.9, while Fedora 35 has Python 3.10. Since virtual environments for one don't work with the other, the virtual environment for my installed programs couldn't find any Python packages that had been installed in them.

Since I've had success with 'pipx reinstall' before, I assumed that the way to fix this was to do a reinstall. Unfortunately this resulted in a spectacular failure, where pipx deleted my virtual environment then failed to recreate it with an error about pip not being available. Since the initial deletion lost the pipx metadata for my installed program there was no easy recovery, and anyway a 'pipx install' also had the 'pip not available' problem. Ultimately, this appears to be because pipx has a more or less hidden virtual environment of its own in ~/.local/pipx/shared, where it puts shared things that crucially includes pip itself. This virtual environment is also bound to a specific version of Python; if you change your Python, it too stops working, which means that any per-program virtual environments that point to it also stop working.

(You can find the signs of this in your venvs as a pipx_shared.pth file in each venv's lib/python3.X/site-packages/ directory, which has the absolute path to the relevant part of this shared venv. Note that this means that your pipx installed venvs will probably fail if you change your home directory or copy them to another system with a different home directory, because they have the absolute path to this shared tree.)

On my laptop, I fixed the problem by the brute force solution of removing ~/.local/pipx entirely, but I only had one program installed through pipx. I did experiment enough to determine that pipx will recreate ~/.local/pipx/shared if you delete it (or rename it), but I don't know if this will work through a complete installed Python version upgrade process. If it does, I think what you need to do is upgrade the Python you're using, delete ~/.local/pipx/shared, then do 'pipx reinstall-all'.

This is clearly a pipx bug where it should automatically detect an out of date shared area and rebuild it, but we deal with the pipx we have now, not the pipx we would like to have.

(This elaborates on some tweets.)

PS: Although pipx doesn't expose this to you, you can get a Python shell in the virtual environment of any installed program by running ~/.local/pipx/venv/<what>/bin/python. This may be useful if you want to do things like inspect that Python's sys.path setting.

PipxPythonVersionIssue written at 22:12:49; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.