2025-01-21
A change in the handling of PYTHONPATH between Python 3.10 and 3.12
Our long time custom for installing Django for our Django based web application was to install it with 'python3 setup.py install --prefix /some/where', and then set a PYTHONPATH environment variable that pointed to /some/where/lib/python<ver>/site-packages. Up through at least Python 3.10 (in Ubuntu 22.04), you could start Python 3 and then successfully do 'import django' with this; in fact, it worked on different Python versions if you were pointing at the same directory tree (in our case, this directory tree lives on our NFS fileservers). In our Ubuntu 24.04 version of Python 3.12 (which also has the Ubuntu packaged setuptools installed), this no longer works, which is inconvenient to us.
(It also doesn't seem to work in Fedora 40's 3.12.8, so this probably isn't something that Ubuntu 24.04 broke by using an old version of Python 3.12, unlike last time.)
The installed site-packages directory contains a number of
'<package>.egg' directories, a site.py file that I believe is
generic, and an easy-install.pth that lists the .egg directories.
In Python 3.10, strace
says that Python 3 opens site.py and then
easy-install.pth during startup, and then in a running interpreter,
'sys.path
' contains the .egg directories. In Python 3.12, none
of this happens, although CPython does appear to look at the overall
'site-packages' directory and 'sys.path
' contains it, as you'd
expect. Manually adding the .egg directories to a 3.12 sys.path
appears to let 'import django' work, although I don't know if
everything is working correctly.
I looked through the 3.11 and 3.12 "what's new" documentation (3.11, 3.12) but couldn't find anything obvious. I suspect that this is related to the removal of distutils in 3.12, but I don't know enough to say for sure.
(Also, if I use our usual Django install process, the Ubuntu 24.04 Python 3.12 installs Django in a completely different directory setup than in 3.10; it now winds up in <top level>/local/lib/python3.12/dist-packages. Using 'pip install --prefix ...' does create something where pointing PYTHONPATH at the 'dist-packages' subdirectory appears to work. There's also 'pip install --target', which I'd forgotten about until I stumbled over my old entry.)
All of this makes it even more obvious to me than before that the Python developers expect everyone to use venvs and anything else is probably going to be less and less well supported in the future. Installing system-wide is probably always going to work, and most likely also 'pip install --user', but I'm not going to hold my breath for anything else.
(On Ubuntu 24.04, obviously we'll have to move to a venv based Django installation. Fortunately you can use venvs with programs that are outside the venv.)
2025-01-16
Some stuff about how Apache's mod_wsgi runs your Python apps (as of 5.0)
We use mod_wsgi to host our Django application, but if I understood the various mod_wsgi settings for how to run your Python WSGI application when I originally set it up, I've forgotten it all since then. Due to recent events, exactly how mod-wsgi runs our application and what we can control about that is now quite relevant, so I spent some time looking into things and trying to understand settings. Now it's time to write all of this down before I forget it (again).
Mod_wsgi can run your WSGI application in two modes, as covered in the quick configuration guide part of its documentation: embedded mode, which runs a Python interpreter inside a regular Apache process, and daemon mode, where one or more Apache processes are taken over by mod_wsgi and used exclusively to run WSGI applications. Normally you want to use daemon mode, and you have to use daemon mode if you want to do things like run your WSGI application as a Unix user other than the web server's normal user or use packages installed into a Python virtual environment.
(Running as a separate Unix user puts some barriers between your application's data and a general vulnerability that gives the attacker read and/or write access to anything the web server has access to.)
To use daemon mode, you need to configure one or more daemon processes
with WSGIDaemonProcess.
If you're putting packages (such as Django) into a virtual environment,
you give an appropriate 'python-home=
' setting here. Your application
itself doesn't have to be in this venv. If your application lives
outside your venv, you will probably want to set either or both of
'home=
' and 'python-path=
' to, for example, its root directory
(especially if it's a Django application). The corollary to this is
that any WSGI application that uses a different virtual environment,
or 'home' (starting current directory), or Python path needs to be in a
different daemon process group. Everything that uses the same process
group shares all of those.
To associate a WSGI application or a group of them with a particular daemon process, you use WSGIProcessGroup. In simple configurations you'll have WSGIDaemonProcess and WSGIProcessGroup right next to each other, because you're defining a daemon process group and then immediately specifying that it's used for your application.
Within a daemon process, WSGI applications can run in either the
main Python interpreter or a sub-interpreter (assuming that you
don't have sub-interpreter specific problems).
If you don't set any special configuration directive, each WSGI
application will run in its own sub-interpreter and the main
interpreter will be unused. To change this, you need to set something
for WSGIApplicationGroup,
for instance 'WSGIApplicationGroup %{GLOBAL}
' to run your WSGI
application in the main interpreter.
Some WSGI applications can cohabit with each other in the same interpreter (where they will potentially share various bits of global state). Other WSGI applications are one to an interpreter, and apparently Django is one of them. If you need your WSGI application to have its own interpreter, there are two ways to achieve this; you can either give it a sub-interpreter within a shared daemon process, or you can give it its own daemon process and have it use the main interpreter in that process. If you need different virtual environments for each of your WSGI applications (or different Unix users), then you'll have to use different daemon processes and you might as well have everything run in their respective main interpreters.
(After recent experiences, my feeling is that processes are probably cheap and sub-interpreters are a somewhat dark corner of Python that you're probably better off avoiding unless you have a strong reason to use them.)
You normally specify your WSGI application to run (and what URL it's on) with WSGIScriptAlias. WSGIScriptAlias normally infers both the daemon process group and the (sub-interpreter) 'application group' from its context, but you can explicitly set either or both. As the documentation notes (now that I'm reading it):
If both
process-group
andapplication-group
options are set, the WSGI script file will be pre-loaded when the process it is to run in is started, rather than being lazily loaded on the first request.
I'm tempted to deliberately set these to their inferred values simply so that we don't get any sort of initial load delay the first time someone hits one of the exposed URLs of our little application.
For our Django application, we wind up with a collection of directives like this (in its virtual host):
WSGIDaemonProcess accounts .... WSGIProcessGroup accounts WSGIApplicationGroup %{GLOBAL} WSGIScriptAlias ...
(This also needs a <Directory> block to allow access to the Unix directory that the WSGIScriptAlias 'wsgi.py' file is in.)
If we added another Django application in the same virtual host, I believe that the simple update to this would be to add:
WSGIDaemonProcess secondapp ... WSGIScriptAlias ... process-group=secondapp application-group=%{GLOBAL}
(Plus the <Directory> permissions stuff.)
Otherwise we'd have to mess around with setting the WSGIProcessGroup and WSGIApplicationGroup on a per-directory basis for at least the new application. If we specify them directly in WSGIScriptAlias we can skip that hassle.
(We didn't used to put Django in a venv, but as of Ubuntu 24.04, using a venv seems the easiest way to get a particular Django version into some spot where you can use it. Our Django application doesn't live inside the venv, but we need to point mod_wsgi at the venv so that our application can do 'import django.<...>' and have it work. Multiple Django applications could all share the venv, although they'd have to use different WSGIDaemonProcess settings, or at least different names with the same other settings.)
2025-01-15
(Multiple) inheritance in Python and implicit APIs
The ultimate cause of our mystery with Django on Ubuntu 24.04 is that versions of Python 3.12
before 3.12.5 have a bug where builtin types in sub-interpreters
get unexpected additional slot wrappers (also), and Ubuntu 24.04
has 3.12.3. Under normal circumstances, 'list
' itself doesn't
have a '__str__
' method but instead inherits it from 'object
',
so if you have a class that inherits from '(list,YourClass)' and
YourClass defines a __str__
, the YourClass.__str__ is what
gets used. In a sub-interpreter, there is a list.__str__ and
suddenly YourClass.__str__ isn't used any more.
(mod_wsgi triggers this issue because in a straightforward configuration, it runs everything in sub-interpreters.)
This was an interesting bug, and one of the things it made me realize
is that the absence of a __str__ method on 'list
' itself had
implicitly because part of list
's API. Django had set up class
definitions that were 'class Something(..., list, AMixin)', where
the 'AMixin' had a direct __str__ method, and Django expected
that to work. This only works as long as 'list
' doesn't have its
own __str__ method and instead gets it through inheritance from
object.__str__. Adding such a method to 'list
' would break
Django and anyone else counting on this behavior, making the lack
of the method an implicit API.
(You can get this behavior with more or less any method that people might want to override in such a mixin class, but Python's special methods are probably especially prone to it.)
Before I ran into this issue, I probably would have assumed that where in the class tree a special method like __str__ was implemented was simply an implementation detail, not something that was visible as part of a class's API. Obviously, I would have been wrong. In Python, you can tell the difference and quite easily write code that depends on it, code that was presumably natural to experienced Python programmers.
(Possibly the existence of this implicit API was obvious to experienced Python programmers, along with the implication that various builtin types that currently don't have their own __str__ can't be given one in the future.)
2025-01-13
A mystery with Django under Apache's mod_wsgi on Ubuntu 24.04
We have a long standing Django web application that these days runs under Python 3 and a more modern version of Django. For as long as it has existed, it's had some forms that were rendered to HTML through templates, and it has rendered errors in those forms in what I think of as the standard way:
{{ form.non_field_errors }} {% for field in form %} [...] {{ field.errors }} [...] {% endfor %}
This web application runs in Apache using mod_wsgi, and I've recently been working on moving the host this web application runs on to Ubuntu 24.04 (still using mod_wsgi). When I stood up a test virtual machine and looked at some of these HTML forms, what I saw was that when there were no errors, each place that errors would be reported was '[]' instead of blank. This did not happen if I ran the web application on the same test machine in Django's 'runserver' development testing mode.
At first I thought that this was something to do with locales, but
the underlying cause is much more bizarre and inexplicable to me.
The template operation for form.non_field_errors
results in
calling Form.non_field_errors(), which returns a
django.forms.utils.ErrorList
object (which is also what field.errors
winds up being). This
class is a multiple-inheritance subclass of UserList, list, and
django.form.utils.RenderableErrorMixin.
The latter is itself a subclass of django.forms.utils.RenderableMixin,
which defines a __str__() special method value that is
RenderableMixin.render(), which renders the error list properly,
including rendering it as a blank if the error list is empty.
In every environment except under Ubuntu 24.04's mod_wsgi,
ErrorList.__str__
is RenderableMixin.render
and everything
works right for things like 'form.non_field_errors
' and
'field.errors
'. When running under Ubuntu 24.04's mod_wsgi,
and only then, ErrorList.__str__
is actually the standard
list.__str__
, so empty lists render as '[]' (and had I tried
to render any forms with actual error reports, worse probably would
have happened, especially since list.__str__ isn't carefully
escaping special HTML characters).
I have no idea why this is happening in the 24.04 mod_wsgi. As
far as I can tell, the method resolution order (MRO) for ErrorList
is the same under mod_wsgi as outside it, and sys.path
is the
same. The RenderableErrorMixin class is getting included as a parent
of ErrorList, which I can tell because RenderableMixin also provides
a __html__
definition, and ErrorList.__html__
exists and
is correct.
The workaround for this specific situation is to explicitly render errors to some format instead of counting on the defaults; I picked .as_ul(), because this is what we've normally gotten so far. However the whole thing makes me nervous since I don't understand what's special about the Ubuntu 24.04 mod_wsgi and who knows if other parts of Django are affected by this.
(The current Django and mod_wsgi setup is running from a venv, so it should also be fully isolated from any Ubuntu 24.04 system Python packages.)
(This elaborates on a grumpy Fediverse post of mine.)