I should keep track of what Python packages I install through pip

June 28, 2021

These days I'm increasingly making use of installing Python packages through pip, whether this is into a PyPy environment or with 'pip install --user' for things like python-lsp-server. Having done this for a while, complete with trying to keep up with potential package upgrades, I've come to the conclusion that I should explicitly keep track of what packages I install, recording this in some place I can find it again.

There are two problems (or issues) that push me to this. The first is that as far as I know, Pip doesn't keep track of a distinction between packages that you've asked it to install and the dependencies of those packages. All of the packages show up in 'pip list', and any can show up in 'pip list --outdated'. My understanding is that in the normal, expected use of Pip you'll keep track of this in your project in a requirements file, then use that to build the project's virtualenv. This is not really the model of installing commands, especially commands like python-lsp-server that have install time options.

The second issue is that Pip installed packages are implicitly for a specific version of Python. If you rely on the system Python (instead of your own version) and that version gets upgraded, suddenly 'pip list' will report nothing (and you will in fact have no packages available). At this point you need to somehow recover the list of installed packages and re-install all of them (unless you resort to unclean hacks). Explicitly keeping track of this list in advance is easier than having to dig it out at the time.

Having an explicit list helps in other situations. Perhaps you started out installing all of your tools under CPython, but now you want to see how well they'll work under PyPy. Perhaps you're building a new PyPy based environment with a new version of PyPy and want to start over from scratch. Perhaps you think package versions and dependencies have gotten snarled and you're carrying surplus packages, so you want to delete everything and start over from scratch.

(Starting over from scratch can also be the easiest way to get the best version of dependencies, since the packages you're directly installing may have maximum version constraints that will trip you up if you just directly 'pip install --upgrade ...' dependencies.)

PS: Possibly there's ways to do all of this with Pip today, especially things like 'upgrade this and all of its dependencies to the most recent versions that are acceptable'. I'm not well versed in Pip, since mostly I use it as a program installer.


Comments on this page:

By Eldad at 2021-06-28 05:08:11:

There is a nice tool to scratch at least one of the itches you described here, it's called pipx:

https://github.com/pypa/pipx

From the description: "pip is a general-purpose package installer for both libraries and apps with no environment isolation. pipx is made specifically for application installation, as it adds isolation yet still makes the apps available in your shell: pipx creates an isolated environment for each application and its associated packages."

Anything you install with pipx can be later upgraded with a simple "pipx upgrade-all", while "pipx list" will show you what you have installed and it has some other nice features too.

By John Wiersba at 2021-06-28 13:39:38:

Chris, I believe that if you use system python (or any other similar language like perl), then current best practice is not to use pip, but instead to use your system package manager to install additional packages. If instead, you're installing into a virtual environment, then you use pip install --user. This will solve your "second issue".

By ckester at 2021-06-28 14:47:33:

What's the best practice when the program or version of it that you need isn't available yet via the system's package manager?

By cks at 2021-06-28 15:28:04:

The theoretical best practice is to build a separate virtualenv for each program you want to install. I believe that pipx automates this, based on its description (thanks Eldad for mentioning it). But without automation this is a lot of work, so the practical 'best practice' is to use 'pip install --user', which (on Unix systems) installs it to $HOME/.local/bin and $HOME/.local in general.

Both virtualenvs and 'pip install --user' (which is what I use) have the problem that they're implicitly tied to the version of Python you're using, which is most commonly the system CPython (or in some cases PyPy).

(System packages sidestep this by being rebuilt for new versions of CPython or PyPy and then automatically upgraded along side your system CPython, if the packaging is competently done.)

I strongly recommend looking into the idea of "lockfiles". Within pip these are known as constraints files. They tend to be more comprehensive than the requirements ones. The thing is that with requirements files you'll likely only save the direct dependencies but there's also transitive (indirect) deps of those that you leave unrestricted which results in non-reproducible envs. To solve this, you could have pairs of requirements+constraints files. The requirements would contain your direct loosely restricted deps while the constraints would have the whole dependency tree pinned to exact versions. To automate generating/updating/managing the constraints files, look into a project called pip-tools.

Written on 28 June 2021.
« Some notes on what's in Linux's /sys/class/net for network interface status
Be careful when matching on Ethernet addresses in systemd-networkd »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jun 28 00:05:23 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.