How I think I want to drop modern Python packages into a single program

September 16, 2020

For reasons beyond the scope of this blog entry, I'm considering augmenting our Python program to log email attachment information for Exim to use oletools to peer inside MS Office files for indications of bad things. Oletools is not packaged by Ubuntu as far as I can see, and in any case it would be an older version, so we would need to add the oletools Python packages ourselves.

The official oletools install instructions talk about using either pip or setup.py. As a general rule, we're very strongly against installing anything system-wide except through Ubuntu's own package management system, and the environment our Python program runs in doesn't really have a home directory to use pip's --user option, so the obvious and simple pip invocations are out. I've used a setup.py approach to install a large Python package into a specific directory hierarchy in the past (Django), and it was a big pain, so I'd like not to do it again.

(Nor do we want to learn about how to build and maintain Python virtual environments, and then convert how we run this Python program to use one.)

After some looking at pip's help output I found the 'pip install --target <directory>' option and tested it a bit. This appears to do more or less what I want, in that it installs oletools and all of its dependencies into the target directory. The target directory is also littered with various metadata, so we probably don't want to make it where the program's normal source code lives. This means we'll need to arrange to run the program so that $PYTHONPATH is set to the target directory, but that's a solvable problem.

(This 'pip install' invocation does write some additional pip metadata to your $HOME. Fortunately it actually does respect the value of the $HOME environment variable, so I can point that at a junk directory and then delete it afterward. Or I can make $HOME point to my target directory so everything is in one place.)

All of this is not quite as neat and simple as dropping an oletools directory tree in the program's directory, in the way that I could deal with needing the rarfile module, but then again oletools has a bunch of dependencies and pip handles them all for me. I could manually copy them all into place, but that would actually create a sufficiently cluttered program directory that I prefer a separate directory even if it needs a $PYTHONPATH step.

(Some people will say that setting $PYTHONPATH means that I should go all the way to a virtual environment, but that would be a lot more to learn and it would be more opaque. But looking into this a bit did lead to me learning that Python 3 now has standard support for virtual environments.)


Comments on this page:

By Todd at 2020-09-17 06:59:51:

OMG pipenv is a lifesaver for me. It won't be suitable for you, probably, but it works well for me.

Since it was brought up… pipenv has been quite painful.

In particular, it considers the Python minor version as a "major" semantic version, so a Pipfile.lock that calls for 3.6 won't run on 3.8. Which means, every Python minor version is now a new Python 3 transition, unless you build your own tooling around it to avoid a flag day.

(My thoughts in greater depth: on 3.6/3.8 problems in our Ubuntu 20.04 update, and on why no packaging tools look good. I did choose pipenv and we are using it, but I hate it so much.)

You could also go the other direction and use PyOxidizer (https://github.com/indygreg/PyOxidizer). One of the things it does is embed all your python dependencies inside the executable it builds, and changes the package importer to load from there. If you use the loader independently of PyOxidizer, it can load from a resource file instead. (https://pyoxidizer.readthedocs.io/en/latest/oxidized_importer.html) This is significantly faster as well, since the bytecode is already generated and all your modules are in one file rather than scattered across dozens of files.

By wombatPM at 2020-09-17 14:09:40:

Virtual Environments are the way to go. If your server is running a LTS version, system python may be woefully out of date. Virtual environments allow you to completely isolate your application without going down the build an executable path (where the exe is just a zip archive with a special boot loaded)

wombatpm

By Gabriel A Devenyi at 2020-09-17 20:17:50:

virtualenvs are definitely the way to go here: https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/

We constantly use these in scientific software to isolate conflicting version stacks and to keep clean snapshots of the versions used for various projects.

Written on 16 September 2020.
« Why I write recursive descent parsers (despite their issues)
Python 3 venvs don't normally really embed their own copy of Python (on Unix) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Sep 16 23:54:46 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.