Apache's mod_wsgi and the Python 2 issue it creates
If you use Apache (as we do) and have relatively casual WSGI-based applications (again, as we do), then Apache's mod_wsgi is often the easiest way to deploy your WSGI application. Speaking as a system administrator, it's quite appealing to not have to manage a separate configuration and a separate daemon (and I still get process separation and different UIDs). But at the moment there is a little problem, at least for people (like us) who use their Unix distribution's provided version of Apache and mod_wsgi rather than build your own. The problem is that any given build of mod_wsgi only supports one version of (C)Python.
(Mod_wsgi contains an embedded CPython interpreter, although generally it's not literally embedded; instead mod_wsgi is linked to the appropriate libpython shared library.)
In the glorious future there will only be (some version of) Python 3, and this will not be an issue. All of your WSGI programs will be Python 3, mod_wsgi will use some version of Python 3, and everything will be relatively harmonious. In the current world, there is still a mixture of Python 2 and Python 3, and if you want to run a WSGI based program written in a different version of Python than your mod_wsgi supports, you will be sad. As a corollary of this, you just can't run both Python 2 and Python 3 WSGI applications under mod_wsgi in a single Apache.
Some distributions have both Python 2 and Python 3 versions of mod_wsgi available; this is the case for Ubuntu 20.04 (which answers something I wondered about last January). This at least lets you pick whether you're going to run Python 2 or Python 3 WSGI applications on any given system. Hopefully no current Unix restricts itself to only a Python 2 mod_wsgi, since there's an increasing number of WSGI frameworks that only run under Python 3.
(For example, Django last supported Python 2 in 1.11 LTS, which is no longer supported; support stopped some time last year.)
PS: Since I just looked it up, CentOS 7 has a Python 3 version of mod_wsgi in EPEL, and Ubuntu 18.04 has a Python 3 version in the standard repositories.
A semi-surprise with Python's urllib.parse and partial URLs
One of the nice things about urllib.parse (and its Python 2 equivalent) is that it will deal with partial URLs as well as full URLs. This is convenient because there are various situations in a web server context where you may get either partial URLs or full URLs, and you'd like to decode both of them in order to extract various pieces of information (primarily the path, since that's all you can reliably count on being present in a partial URL). However, URLs are tricky things once you peek under the hood; see, for example, URLs: It's complicated.... A proper URL parser needs to deal with that full complexity, and that means that it hides a surprise about how relative URLs will be interpreted.
Suppose, for example, that you're parsing an Apache
to extract the request's path. You have to actually parse the
request's URI to get this, because funny people can send you full
URLs in HTTP
GET requests, which Apache will pass through to you.
Now suppose someone accidentally creates a URL for a web page of
yours that looks like 'https://example.org//your/page/url' (with
two slashes after the host instead of one) and visits it, and you
attempt to decode the result of what Apache will hand you:
>>> urllib.parse.urlparse("//your/page/url") ParseResult(scheme='', netloc='your', path='/page/url', params='', query='', fragment='')
The problem here is that '//ahost.org/some/path' is a perfectly legal protocol-relative URL, so that's what urllib.parse will produce when you give it something that looks like one, which is to say something that starts with '//'. Because we know where it came from, you and I know that this is a relative URL with an extra / at the front, but urlparse() can't make that assumption and there's no way to limit its standard-compliant generality.
If this is an issue for you (as it was for me recently), probably the best thing you can do is check for a leading '//' before you call urlparse() and turn it into just '/' (the simple way is to just strip off the first character in the string). Doing anything more complicated feels like it's too close to trying to actually understand URLs, which is the very job we want to delegate to urlparse() because it's complicated.
PS: Because I tested it just now, the result of giving urlparse()
a relative URL that starts with three or more slashes is that it's
interpreted as a relative URL, not a protocol-relative URL. The
path of the result will have the extra leading slashes stripped
I should keep track of what Python packages I install through
These days I'm increasingly making use of installing Python packages
pip, whether this is into a PyPy environment or with '
pip install --user' for things
like python-lsp-server. Having done this for a
while, complete with trying to keep up with potential package
upgrades, I've come to the conclusion that I should explicitly keep
track of what packages I install, recording this in some place I
can find it again.
There are two problems (or issues) that push me to this. The first
is that as far as I know, Pip doesn't keep track of a distinction
between packages that you've asked it to install and the dependencies
of those packages. All of the packages show up in '
pip list', and
any can show up in '
pip list --outdated'. My understanding is
that in the normal, expected use of Pip you'll keep track of this
in your project in a requirements file,
then use that to build the project's virtualenv. This is not really
the model of installing commands, especially commands like
that have install time options.
The second issue is that Pip installed packages are implicitly
for a specific version of Python. If you
rely on the system Python (instead of your own version) and that
version gets upgraded, suddenly '
pip list' will report nothing
(and you will in fact have no packages available). At this point
you need to somehow recover the list of installed packages and
re-install all of them (unless you resort to unclean hacks).
Explicitly keeping track of this list in advance is easier than
having to dig it out at the time.
Having an explicit list helps in other situations. Perhaps you started out installing all of your tools under CPython, but now you want to see how well they'll work under PyPy. Perhaps you're building a new PyPy based environment with a new version of PyPy and want to start over from scratch. Perhaps you think package versions and dependencies have gotten snarled and you're carrying surplus packages, so you want to delete everything and start over from scratch.
(Starting over from scratch can also be the easiest way to get the
best version of dependencies, since the packages you're directly
installing may have maximum version constraints that will trip you
up if you just directly '
pip install --upgrade ...' dependencies.)
PS: Possibly there's ways to do all of this with Pip today, especially things like 'upgrade this and all of its dependencies to the most recent versions that are acceptable'. I'm not well versed in Pip, since mostly I use it as a program installer.
Early notes on using the new python-lsp-server (pylsp) in GNU Emacs
When I started with LSP-based Python editing in GNU Emacs, the Python LSP server was pyls. However, pyls is apparently now unmaintained and the new replacement is python-lsp-server, also known as 'pylsp'. I noticed this recently when I looked into type hints a bit, and then when I was editing some Python today, lsp-mode or some sub-component nagged me about it:
Warning (emacs): The palantir python-language-server (pyls) is unmaintained; a maintained fork is the python-lsp-server (pylsp) project; you can install it with pip via: pip install python-lsp-server
The first thing to note about pylsp is that it supports Python 3 only (it says Python 3.6+). It works to some degree if you edit Python 2 code, but I don't fully trust it, so I'm keeping around my current Python 2 version of the older pyls. Pyls may be unmaintained, but at the moment it appears to work okay.
Because of the Python 3 versus 2 issue, I already had a front end 'pyls' script to try to figure out which Python's version of pyls I needed to run. Fortunately pyls and pylsp are currently invoked with the same (lack of) arguments, so I cheated by renaming this script to 'pylsp' and having it run the real pylsp for Python 3 and fall back to pyls for Python 2.
Lately I've been running pyls under PyPy,
so I started out by installing pylsp this way too. Pylsp (like pyls
before it) has some useful looking third party plugins and
since I was installing from scratch now I decided to install some
of them, including mypy-ls.
This is when I found out that unfortunately mypy doesn't run under
So I switched to installing pylsp using CPython with my usual '
install --user'. This worked for pylsp itself and mypy-ls,
had issues due to memestra
basically having to be installed in a virtualenv or other personal
install of CPython or PyPy. I dealt with this by removing
pyls-memestra; it might be nice but it's not essential.
(Memestra attempts to make a directory under
is owned by root if you're running the system CPython or PyPy.)
The result appears to work fine and has no more warning messages in various internal Emacs LSP buffers than I expect, but I haven't used it extensively yet. I'm not sure I'll keep mypy-ls yet, because it does add some extra warnings in some situations. The warnings are valid ones if you're using type annotations, but a potential problem if you're not. Probably it's good for me to get the warnings and maybe start fixing them.
The temptation to start using some Python type hints
I've been hearing about Python 3's optional type hints for some time now (which is not surprising, since they were introduced in Python 3.5, released in 2015, although then improved in several subsequent releases), but for a long time I didn't really pay much attention. However, lately I've been feeling a temptation to try them out in some of my programs, assuming that I can find a suitable one.
One significant driver of this new temptation has been using GNU Emacs with LSP based editing for both Python and Go. My experience has been that LSP-based editing is significantly nicer for my Go than for my Python, partly because I get much better completion suggestions for Go. I suspect that having typing information would improve things here when I'm editing Python.
(Reading around a bit, it appears that pyls has more or less native support for type hints through using Jedi, with optional use of mypy through pyls-mypy. Also, apparently I probably want to switch to python-lsp-server instead of pyls, cf.)
Another reason is particular to my circumstances with Python. I deal with a lot of Python code that I touch only infrequently, when it needs some new feature or a bug fix; most of the time it sits there working away. When it's been at least months since I last looked at the code, going back to it always involves some amount of re-discovery of what functions do, how things flow in the code, and what types various things are. Adding type hints would help speed up that re-learning of my code, and as a bonus it would help catch any typing mistakes that I make as I do my additions, changes, or bug fixes.
(I've definitely observed this advantage when I go back to my Go code, and having enforced types in Go gives me more confidence in changes. To get something like this in Python, I assume I would want to use mypy.)
For various reasons, I'd want to start with my own personal code instead of initially adding any type annotations to our production code. Type hints are still unusual Python, and until they prove their value to us I need to be conservative in that sort of thing. It is tempting to add some type annotations to a private copy of some of our production code, just to see if mypy reports any type confusions or conflicts that turn up bugs.
PS: I expect type hints to work relatively well for my code because
it tends to be more or less statically typed in practice at runtime.
Where they're not strictly statically typed, I could probably mostly
deal with it with things like
It's pleasantly easy to install PyPy yourself (from their binaries)
The Python language server is the most substantial Python program I run on our servers, making it an obvious candidate to try running under PyPy for a speedup. The last time around, I optimistically assumed that I would use the Ubuntu packaged version of PyPy. Unfortunately, all of our login servers are still running Ubuntu 18.04 and 18.04 has no packaged version of PyPy 3. Since Python 3 is what I use for much of both my personal and our work code and you have to run pyls under the same version of Python as the code you're working on, this is a bit of a problem. So I decided to try out the PyPy procedures for installing a pre-built PyPy, with the latest release binaries.
This turned out to be just as easy and as pleasant (on Linux) as the
documentation presented it. The tarball could be unpacked to put its
directory tree anywhere (I put it in my $HOME/lib), and it ran fine
on Ubuntu 18.04 and 20.04. I needed
pip to install
pyls, so I
followed their directions to run '
./pypy-xxx/bin/pypy -m ensurepip',
which downloaded everything needed into PyPy's tree and created a
./pypy-xxx/bin/pip program that I could use for everything else. As
with virtualenvs, once I installed pyls through pip I could run
$HOME/lib/pypy-xxx/bin/pyls and it all just worked.
In theory I think I could go on to use my $HOME/lib versions of PyPy3 and PyPy to create virtualenvs and then install things into those virtualenvs. In practice this is an extra step that I don't need for my purposes. Installing pyls and anything else I want to run under PyPy with './pypy-xxx/bin/pip install ...' already neatly isolates it in a directory hierarchy, just like a virtualenv does.
(Installing PyPy3 was so easy and straightforward that I decided I might as well also install the standard pre-built PyPy2, just so I had a known and up to date quantity instead of whatever Ubuntu had in their PyPy packages. Plus, even if I used the system version, I would have had to make a virtualenv for it. It took almost no extra effort to go all the way to using the pre-built binaries.)
All of this is really how installing pre-built software should work (and certainly how it's documented for PyPy). But I date from an era where it was usually much more difficult and pre-built software was often rather picky about where you put it or wanted to spray bits of itself all over your $HOME (or elsewhere). Right now it's still a bit of a pleasant shock when a pre-built program actually works this easily, whether it's PyPy or Rust.
How I want to use pip with PyPy to install third party programs
After seeing PyPy run a moderate program faster than CPython, I wondered how easy it would be to use
PyPy to run pyls (which I use
for LSP-based editing for Python in GNU Emacs),
since pyls is reasonably big and probably reasonably CPU intensive.
Pyls is a third party program that's not packaged by Ubuntu but
which is available through PyPI, so normally
I install it with '
pip3 install --user ...' so that it goes in
I'll start with my conclusion: for PyPy, I want to use a virtualenv to install and manage third party programs (at least for the version and setup of PyPy that Ubuntu packages). I'm not normally much of a fan of virtualenvs for various reasons and I'd avoid them here if I could, but using a virtualenv is less annoyance and more likely to succeed than trying to get a pip for PyPy to coexist with pip for CPython. Perhaps you can make it work with enough sweat, but it's a lot easier to shrug, make a virtualenv or two, and accept 30 MBytes of overhead per virtualenv.
You definitely want one virtualenv for PyPy 2 and a second one for PyPy 3. I think you can put all of the third party commands you want into the single virtualenv rather than having to have one virtualenv for pyls, one for YAPF, and so on. To set up my virtualenvs using the Ubuntu version of PyPy, I followed the PyPy documentation for this:
virtualenv -p /usr/bin/pypy3 $HOME/lib/pypy3-env virtualenv -p /usr/bin/pypy $HOME/lib/pypy2-env
The Ubuntu packaged version of
virtualenv is a Python 3 program,
but it still works to set up a Python 2 PyPy virtualenv. This is
probably routine to people who're familiar with it, but I'm not.
Once your virtualenvs are set up, you can start installing things with the virtualenv's pip as usual:
$HOME/lib/pypy3-env/bin/pip install python-language-server
(For pyls, you'll need the PyPy development packages to be installed. On Ubuntu 20.04 these are pypy-dev and pypy3-dev.)
You don't need to activate the virtualenv to run commands from it;
as I found out earlier, virtual environments transparently add
sys.path. I'm not sure what
maintenance you'll need to do to a virtualenv when PyPy changes
versions (or changes what version of Python 3 it claims to be).
I'll probably get to find out someday.
Even if your system version of Python 2 doesn't package and supply pip (Fedora now doesn't ship it), your virtualenv appears to magically get it and it works. I don't quite know how this works (although I'm sure I could find out if I dug into it), but I'm happy with the result since it's quite convenient.
(Our Ubuntu 18.04 machines have no standard package for PyPy 3, but that's another issue. Perhaps we'll be able to switch our user login machines over to 20.04 this summer.)
PyPy starts fast enough for our Python 2 commands
Some day, Linux distributions like Ubuntu are not going to package Python 2 even in a limited version, the way they're doing now. One way for us to deal with this would be to migrate all of our remaining little Python 2 programs and scripts to Python 3. Another option is to run them under PyPy, which says that it will always support Python 2.7.
One of the potential issues with PyPy is that its JIT has a high warm-up cost, which means that small, short-running programs are going to be slower, perhaps significantly slower. Most of the Python 2 that we have left is in small administrative commands that are mostly run automatically, where on the one hand I would expect PyPy's overhead to be at its largest and on the other hand we probably don't really care about the overhead if it's not too big. So I decided to do some quick tests.
(I've been hit by the startup overhead of small programs in Python even without PyPy, but it was in an interactive situation.)
I did my tests on one of our Ubuntu 20.04 servers, which has PyPy version 7.3.1, and the results turned out to be more interesting than I expected. The artificial and irrelevant worst case was a Python 3 program that went from about 0.05 second to about 0.17 second (under pypy3) to actually do its work. Our typical small Python 2 commands seem to go from 0.01 or 0.02 second to about 0.07 second or so. The surprising best case was a central program used for managing our password file, where the runtime under PyPy actually dropped from around 0.40 second to 0.33 second. And a heavily multithreaded program that runs a lot of concurrent ssh commands had essentially the same runtime on a different 20.04 machine.
(In retrospect, the password file processing program does have to process several thousand lines of text, so perhaps I should not have been surprised that it's CPU-intensive enough for PyPy to speed it up. Somehow it's in my mind as a small, lightweight thing.)
All of this says that PyPy starts (and runs) our Python programs more than fast enough to serve us as an alternate implementation of Python 2 if we need to turn to it.
Packaging Python 2 doesn't mean that Linux distributions support it
One of the reasons I've been optimistic about Python 2's continued
afterlife for at least a few more years is that various Linux
distributions with long term support have packaged it in versions with
support that would last for years to come. Those distributions would
provide fixes for any security issues that came up, as they do for all
of their packages (more or less), and people running Python 2 elsewhere
could take those updated versions of Python 2, recompile them, and
use them even on platforms without that sort of support. The recent
ctypes security issue was the first serious test
of my optimistic belief. I'm afraid to report that it has partially
As I write this, most Linux distributions that still provide Python 2 have provided an updated Python 2 package that fixes this issue; for instances, Fedora is updated. The relatively glaring exception that I know of is Ubuntu in 20.04 LTS. Although Ubuntu had an initial stumble in the updates for 16.04 LTS and 18.04 LTS, they have fixed them by now. Unfortunately there's no sign of any update for 20.04 LTS. Ubuntu knows that an update is needed (per their page for CVE-2021-3177), and they have the code update that they need (since they've fixed this in 18.04 and 16.04, including their fixed fix), but they aren't doing anything.
At one level this has surprised me. At another level, it shouldn't have. All of the Linux distributions have been clear that they want to get rid of Python 2 and are only still providing it reluctantly. In retrospect, it was optimistic to assume that despite this reluctance, all of the distributions would always still fix issues in all versions of Python 2 instead of shrugging and pointing out that in general, Python 2 had explicitly reached the end of its life. What's happened in Ubuntu 20.04 so far may be an accident, but it shouldn't surprise me if some day Linux distributions start doing this deliberately.
(Fortunately I don't think this issue is serious for us, so for now I feel that we're okay even on 20.04.)
PS: Not all Linux distributions are likely to stop updating Python 2. Red Hat Enterprise Linux especially has a serious commitment to long term bug fixes, so I do expect them to keep fixing their version of Python 2 for as long as they provide it in a supported RHEL version. Well, probably. Some things involving Red Hat Enterprise Linux have been shaken up recently.
ctypes security issue and Python 2
In the middle of February, the Python developers revealed that Python had been affected by a buffer overflow security issue, CVE-2021-3177. The relatively full details are covered in ctypes: Buffer overflow in PyCArg_repr, and conveniently the original bug report has a very simple reproduction that can also serve to test your Python to see if it's fixed:
$ python2 Python 2.7.17 (default, Feb 25 2021, 14:02:55) >>> from ctypes import * >>> c_double.from_param(1e300) *** buffer overflow detected ***: python2 terminated Aborted (core dumped)
(A fixed version will report '
<cparam 'd' (1e+300)>' here.)
The official bug report only covers Python 3, because Python 2.7 is not supported any more, but as you can see here the bug is present in Python 2 as well (this is the Ubuntu 18.04 version, which is unfixed for reasons).
I'm on record as saying that it was very unlikely for security issues to be discovered in Python 2 after this long. Regardless of how significant this issue is in practice, I was and am wrong. A buffer overflow has lurked in the standard Python library, including Python 2, and was only discovered after official Python support for Python 2 has stopped. There have been other recent security issues in Python 3, per Python security vulnerabilities, and some of them may also apply to Python 2 and be significant for you.
(Linux distributions are still fixing issues like this in Python 2. Well, more or less. Ubuntu hasn't worked out a successful fix for 18.04 and hasn't even tried one for 20.04, but Fedora has fixed the issue.)
This CVE is not an issue for our Python 2 code, where we don't use
ctypes. But it
does make me somewhat more concerned about our remaining Python 2
programs, for the simple reason that I was wrong about one of my beliefs
about Python 2 after its end of support. To use a metaphor, what I
thought was a strong, well-inspected pillar has turned out to have some
previously unnoticed cracks of a sort that matter, even if they've
not yet been spotted in an area that's load-bearing for us. Also,
now I should clearly be keeping an eye on Python security issues and
testing new ones (if possible) to see if they apply to Python 2. If they
do, we'll need to explicitly consider what programs of ours might be
(The answer is often likely to be 'no programs are affected'. but we can no longer take for granted that the issues are not serious and don't affect Python 2 or us.)
As far as the severity of this issue goes, on the one hand buffer overruns are quite bad, but on the other hand this is in what is a relatively obscure corner of Python for most people. This is not the sort of Python security issue that would let people break ordinary Python 2 programs (and I still think that those are very unlikely by now). But I'm a bit biased here, since we're not going to drop everything and port all of our remaining Python 2 programs to Python 3 right now (well, not unless we absolutely have to).
(People's views of the severity may vary; these are just mine.)
PS: To be explicit, this issue has not changed my view that it's
reasonable (and not irresponsible) to continue running Python 2
programs and code. This is not a great sign for people who use
ctypes, but it's not a fatal vulnerability or a major problem sign.