The problem I have with Pip's dependency version handling

November 28, 2021

Python's Pip package manager has a system where main programs and packages can specify the general versions of dependencies that they want. When you install a program through pip (either directly into a virtual environment or with a convenient tool like pipx), pip resolves the general version specifications to specific versions of the packages and installs them too. Like many language package managers, pip follows what I'll call a maximal version selection algorithm; it chooses the highest currently available version of dependencies that satisfy all constraints. Unfortunately I have come to feel that this is a bad choice for at least programs, for two reasons. One of the reasons is general and one of them is specific to pip's current capabilities and tooling.

The general reason is that it makes the installed set of dependencies not automatically reproducible. If I install the Python LSP server today and you install it a week from now, we may well not wind up with the same total version of everything even if the Python LSP server project hasn't released a new version. All it takes is a direct or indirect dependency to release a new version that's compatible with the version restrictions in the intervening week. Your pip install will pick up that new version, following pip's maximal version selection.

This is theoretically great, since you're getting the latest and thus best versions of everything. It is not necessarily practically great, since as we've all experienced, sometimes the very latest versions of things are not in fact the best versions, or at least the best versions in the context you're using them. If nothing else, you're getting a different setup than I am, which may wind up with confusing differences in behavior.

(For instance, your Python LSP server environment might have a new useful capability that mine doesn't. You'll tell me 'just do <X>', and I'll say 'what?'.)

The specific reason is that once I have pip install my version of something, pip doesn't really seem to provide a good way to update it to the versions of everything I'd get if I reinstalled today. That way, it would at least be easy for me and you to get the same versions of everything in our installs of the Python LSP server, which would let us get rid of problems (or at least let me see your problems, if more recent package versions have new problems). Pip has some features to try to do this, but in practice they don't seem to work very well for me. I'm left to do manual inspection with 'pip list --outdated', manual upgrades of things with 'pip install --upgrade', and then use of 'pip check' afterward to make sure that I haven't screwed up and upgraded something too far.

Pip is not going to change its general approach of maximal version selection (I think only Go has been willing to go that far). But I hope that someday either pip or additional tools have a good way to bring existing installs up to what they would be if reinstalled today.

(Pipx has its 'reinstall' option, but that's a blunt hammer and I'm not sure it works in all cases. I suppose I should try it someday on my Python LSP server installation, which has various additional optional packages installed too.)


Comments on this page:

By Joseph at 2021-11-29 04:22:40:

I thought if you ran pip freeze and then pip install from the resulting requirements file then that ensures you get the exact versions as specified in that file. Would that solve the problem in your case?

By remyabel at 2021-11-29 08:47:01:

The reality is pip is not suitable for installing end user programs. It's primarily for dependency management. When using a Linux distro package manager, this is acceptable because the dependencies and python packages are curated and designed to work together like any other package. When using pip install, especially before pip had a dependency resolver, you get a mess of libraries that may be incompatible with each other and cannot upgrade cleanly. This is why I always recommend using pipx for end user programs and pip solely for dependency management, but pipx isn't perfect and only a bandaid.

All the other programming language package managers (npm, go, cargo) have the same issue as they follow basically the same model. You end up writing hacky scripts or using hacky workarounds to keep the programs up to date...and yet developers complain about how Linux package managers do things.

It's interesting to see the different approaches. PHP's Composer does maximal selection by default, but it has a CLI option to get minimal selection instead. (Presumably, for testing that "what I have declared I need" really works; hopefully, all your transitive dependencies have also tested this.)

Exact versions used are then written out to a composer.lock file, so that composer install can use it to install those exact versions. composer upgrade recomputes the active dependencies, updates the lock file, and installs them all.

pip freeze is interesting in theory, but in my (admittedly limited) experience, the lack of an upgrade story leaves you with the same old problems down the road.

By cks at 2021-11-29 10:46:38:

The problem with 'pip freeze' et al is that it's not a natural way to install a program or a package into a virtual environment. If you want a program like the Python LSP server or a package like PyTorch, most people will be following instructions that say 'do pip install ...', not instructions that say 'download or copy this requirements.txt then ...'. You can make it so that people install exactly the same versions of all packages, but it's not what they're going to do normally and in practice you're going to need to build both tools and social expectations.

By shiftless at 2021-11-30 07:24:46:

I would suggest you take a look at pipenv, if you haven't already. It wraps up the virtual environment and manages specific dependency requirements on top of pip. When you install a package it pins it in a Pipfile.lock from which other deployments can install instead of from the pipfile, which is similar to requirements.txt. We used this at my first job at a python web development company and I still use it for personal projects.

Written on 28 November 2021.
« Two stories of how and why simultaneous multithreading works
The long term relative prices of M.2 NVMe drives and 2.5" SSDs »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Nov 28 23:34:38 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.