Wandering Thoughts archives

2022-03-29

Fixing Pipx when you upgrade your system Python version

If you use your system's Python for pipx and then upgrade your system and its version of Python, pipx can have a bad problem that renders your pipx managed virtual environments more or less unrecoverable if you do the wrong thing. Fortunately there turns out to be a way around it, which I tested as part of upgrading my office desktop to Fedora 35 today.

Pipx's problem is that it stashes a bunch of stuff in a ~/.local/pipx/shared virtual environment that depends on the Python version. If this virtual environment exists but doesn't work in the new version of Python that pipx is now running with, pipx fails badly. However, pipx will rebuild this virtual environment any time it needs it, and once rebuilt, the new virtual environment works.

So the workaround is to delete the virtual environment, run a pipx command to get pipx to rebuild it, and then tell pipx to reinstall all your pipx environments. You need to do this after you've upgraded your system (or your Python version). What you do is more or less:

# get rid of the shared venv
rm -rf ~/.local/pipx/shared
# get pipx to re-create it
pipx list
# have pipx fix all of your venvs
pipx reinstall-all

Perhaps there is an easier way to fix up all of your pipx managed virtual environments other than 'pipx reinstall-all', but that's what I went with after my Fedora 35 upgrade and it worked. In any case, I feel that it's not a bad idea to recreate pipx managed virtual environments from scratch every so often just to clean out any lingering cruft.

(It also seems unlikely that there is any better way in general. In one way or another, all of the Python packages have to get reinstalled under the new version of Python. Sometimes you can do this by just renaming files, but any package with a compiled component may need (much) more work. Actually doing the pip installation all over again insures that all of this gets done right, with no hacks that might fail.)

PipxFixingPythonVersion written at 21:50:58; Add Comment

2022-03-19

Some problems that Python's cgi.FieldStorage has

In my entry on our limited use of the cgi module, I praised cgi.FieldStorage as a nice simple way to write Python CGIs that deal with parameters, especially for POST forms. Unfortunately there are some dark sides to cgi.FieldStorage (apart from any bugs it may have), and in fairness I should discuss them. Overall, cgi.FieldStorage is probably safe for internal usage, but I would be a bit wary of exposing it to the Internet in hostile circumstances. The ultimate problem is that in the name of convenience and just working, cgi.FieldStorage is pretty trusting of its input, and on the general web one of the big rules of security is that your input is entirely under the control of an attacker.

So here are some of the problems that cgi.FieldStorage has if you expose it to hostile parties. The first broad issue is that FieldStorage doesn't have any limits:

  • it allows people to upload files to you, whether or not you expected this; the files are written to the local filesystem. Modern versions of FieldStorage do at least delete the files when the Python garbage collector destroys the FieldStorage object.

  • it has no limits on how large a POST body it will accept or how long it will wait to read a POST body in (or how long it will wait to upload files). Some web server CGI environments may impose their own limits on these, especially time, but an attacker can probably at least flood your memory.

    (The FieldStorage init function does have some parameters that could be used to engineer some limits, with additional work like wrapping standard input in a file-like thing that imposes size and time limits. For size limits you can also pre-check the Content-Length.)

Then there is the general problem that GET and POST parameters are not actually really like a Python dict (or any language's form of it). All dictionary like things require unique keys, but attackers are free to feed you duplicate ones in their requests. FieldStorage's behavior here is not well defined, but it probably takes the last version of any given parameter as the true one. If something else in your software stack has a different interpretation of duplicate parameters, your CGI and that other component are actually seeing two different requests. This is a classic way to get security vulnerabilities.

(FieldStorage also has liberal parsing by default, although you can change this with an init function parameter. Incidentally, none of the init function parameters are covered in the cgi documentation; you have to read help() or the cgi.py source.)

Broadly speaking, cgi.FieldStorage feels like a product of an earlier age of web programming, one where CGIs were very much a thing and the web was a smaller and ostensibly friendlier place. For a more or less intranet application that only has to deal with friendly input sent from properly programmed browsers, it's still perfectly good and is unlikely to blow up. For general modern Internet usage, well, not so much, even if you're still using CGIs.

(Wandering Thoughts is still a CGI, although with a lot of work involved. So it can be done.)

CGIFieldStorageIssues written at 21:47:21; Add Comment

2022-03-18

Our limited use of Python's cgi module

The news of the time interval is that Python is going to remove some standard library modules (via). This news caught my eye because two of the modules to be removed are cgi and its closely related kin cgitb. We have a number of little CGIs in our environment for internal use, and many of them are written in Python, so I expected to find us using cgi all over the place. When I actually looked, our usage was much lower than I expected, except for one thing.

Some of our CGIs are purely informational; they present some dynamic information on a web page, and don't take any parameters or otherwise particularly interact with people. These CGIs tend to use cgitb so that if they have bugs, we have some hope of catching things. When these CGIs were written, cgitb was the easy way to do something, but these days I would log tracebacks to syslog using my good way to format them.

(It will probably surprise no one that in the twelve years since I wrote that entry, none of our internal CGIs were changed away from using cgitb. Inertia is an extremely powerful force.)

Others of our CGIs are interactive, such as the CGIs we use for our self-serve network access registration systems. These CGIs need to extract information from submitted forms, so of course they use the ever-popular cgi.FieldStorage class. As far as I know there is and will be no standard library replacement for this, so in theory we will have to do something here. Since we don't want file uploads, it actually isn't that much work to read and parse a standard POST body, or we could just keep our own copy of cgi.py and use it in perpetuity.

(The real answer is that all of these CGIs are still Python 2 and are probably going to stay that way, with them running under PyPy if it becomes necessary because Ubuntu removes Python 2 entirely someday.)

PS: DWiki, the pile of Python that is rendering Wandering Thoughts for you to read, has its own code to handle GET parameters and POST forms, which is why I know that doing that isn't too much work. A very long time ago DWiki did use cgi.FieldStorage and I had some problems as a result, but that got entirely rewritten when I moved DWiki to being based on WSGI.

CGIModuleOurUsage written at 22:47:48; Add Comment

2022-03-02

A Python program can be outside of a virtual environment it uses

A while ago I wrote about installing modules to a custom location, and in that entry one reason I said for not doing this with a virtual environment was that I didn't want to put the program involved into a virtual environment just to use some Python modules. Recently I realized that you don't have to, because of how virtual environments add themselves to sys.path. As long as you run your program using the virtual environment's Python, it gets to use all the modules you installed in the venv. It doesn't matter where the program is and you don't have to move it from its current location, you just have to change what 'python' it uses.

The full extended version of this is that if you have your program set up to run using '#!/usr/bin/env python3', you can change what Python and thus what virtual environment you use simply by changing the $PATH that it uses. The downside of this is that you can accidentally use a different Python than you intended because your $PATH isn't set up the way you thought it was, although in many cases this will result in immediate and visible problems because some modules you expected aren't there.

(One way this might happen is if you run the program using the system Python because you're starting it with a default $PATH. One classical way this can happen is running things from crontab entries.)

Another possible use for this, especially in the $PATH based version, is assembling a new virtual environment with new, updated versions of the modules you use in order to test your existing program with them. You can also use this to switch module versions back and forth in live usage just by changing the $PATH your program runs with (or by repeatedly editing its #! line, but that's more work).

Realizing this makes me much more likely in the future to just use virtual environments for third party modules. The one remaining irritation is that the virtual environment is specific to the Python version, but there are various ways of dealing with that. This is one of the cases where I think we're going to want to use 'pip freeze' (in advance) and then exactly reproduce our previous install in a new virtual environment. Or maybe we can get 'python3 -m venv --upgrade <venv-dir>' to work, although I'm not going to hold my breath on that one.

(A quick test suggests that upgrading the virtual environment doesn't work, at least for going from the Ubuntu 18.04 LTS Python 3 to the Ubuntu 20.04 LTS Python 3. This is more or less what I expected, given what would be involved, so building a new virtual environment from scratch it is. I can't say I'm particularly happy with this limitation of virtual environments, especially given that we always have at least two versions of Python 3 around because we always have two versions of Ubuntu LTS in service.)

VenvsWithProgramsOutside written at 21:25:28; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.