2024-01-30
Putting a Python executable in venvs is probably a necessary thing
When I wrote about getting the Python LSP server working with venvs in a brute force way, Ian Z aka nobrowser commented (and I'm going to quote rather than paraphrase):
I'd say that venvs themselves are "aesthetically displeasing". After all, having a separate Python executable for every project differs from having a separate LSP in degree only.
On Unix, this separate executable is normally only a symbolic link, although other platforms may differ and the venv normally will have its own copy of pip, setuptools, and some other things, which can amount to 20+ Mbytes even on Linux. However, when I thought about it, I don't think there's any good option other than for the venv to have its own (nominal) copy of Python. The core problem is that venvs are very convenient when they're more or less transparently activated.
A Python venv is marked by a special file in the root of the venv, pyvenv.cfg. There are two ways that Python could plausibly decide when to automatically activate a venv without you having to set any environment variables; it can look around the environment of the Python executable you ran for this marker (which is what it does today), or it could look around the environment of your current directory, traversing up the filesystem to see if it could find a pyvenv.cfg (in much the same way that version control systems look for their special .git or .hg directory to mark the repository root).
The problem with automatically activating a venv based on what you find in the current directory and its parents is that it makes Python programs (and the Python interpreter) behave differently depending on where you are when you run them, including random system utilities that just happen to be written in Python. If the program requires any packages beyond the standard library, it may well fail outright because those packages aren't installed in the venv, and if they are installed in the venv they may not be the version the program needs or expects. This isn't a particularly good experience and I'm pretty confident that people would be very unhappy if this was what Python did with venvs.
The other option is to not automatically activate venvs at all and always require you to set environment variables (or the local equivalent). The problem for this is that it's a terrible experience for actually using venvs to, for example, deploy programs as encapsulated entities. You can't just ship the venv and have people run programs that have been installed into its bin/ subdirectory; now they need cover scripts to set the venv environment variables (which might be automatically generated by pip or whatever, but still).
So on the whole embedding the Python interpreter seems the best choice to me. That creates a clear logic to which venv is automatically activated, if any, that can be predicted by people; it's the venv whose Python you're running. Of course I wish it didn't take all of that disk space for extra copies of pip and setuptools, but you can't have everything.