How Python makes it hard to write well structured little utilities

December 14, 2017

I'll start with the tweets, where I sort of hijacked something glyph said with my own grump:

@glyph: Reminder: even ridiculous genius galaxy brain distributed systems space alien scientists can't figure out how to make and ship a fucking basic python executable. Not only do we need to make this easy we need an AGGRESSIVE marketing push once actually viable usable tools exist. <link>

@thatcks: As a sysadmin, it’s a subtle disincentive to writing well structured Python utilities. The moment I split my code into modules, my life gets more annoying.

The zen of Python strongly encourages using namespaces, for good reasons. There's a number of sources of namespaces (classes, for example), but (Python) modules are one big one. Modules are especially useful in their natural state because they also split up your code between multiple files, leaving each file smaller, generally more self-contained, and hopefully simpler. With an 'everything in one file' collection of code, it's a little too easy to have it turn mushy and fuzzy on you, even if in theory it has classes and so on.

This works fine for reasonable sized projects, like Django web apps, where you almost certainly have a multi-stage deployment process and multiple artifacts involved anyway (this is certainly the case for our one Django app). But speaking from personal experience, it rapidly gets awkward if you're a sysadmin writing small utility programs. The canonical ideal running form of a small utility program is a single self-contained artifact that will operate from any directory; if you need it somewhere, you copy the one file and you're done.

(The 'can be put anywhere' issue is important in practice, and if you use modules Python can make it annoying because of the search path issue.)

One part of this awkwardness is my long standing reluctance to use third-party modules. When I've sometimes given in on that, it's been for modules that were already packaged for the only OS where I intended to use the program, and the program only ever made sense to run on a few machines.

But another part of it is that I basically don't modularize the code I write for my modest utilities, even when it might make sense to break it up into separate little chunks. This came into clear view for me recently when I wound up writing the same program in Python and then Go (for local reasons). The Python version is my typical all in one file small utility program, but the Go version wound up split into seven little files, which I think made each small concern easier for me to follow even if there's more Go code in total.

(With that said, the Go experience here has significant warts too. My code may be split into multiple files, but it's all in the same Go package and thus the same namespace, and there's cross-contamination between those files.)

I would like to modularize my Python code here; I think the result would be better structured and it would force me to be more disciplined about cross-dependencies between bits of the code that really should be logically separate. But the bureaucracy required to push the result out to everywhere we need it (or where I might someday want it) means that I don't seriously consider it until my programs get moderately substantial.

I've vaguely considered using zip archives, but for me it's a bridge too far. It's not just that this requires a 'compilation' step (and seems likely to slow down startup even more, when it's already too slow). It's also that, for me, packing a Python program up in a binary format loses some of the important sysadmin benefits of using Python. You can't just look at a zip-packaged Python program to scan how it works, look for configuration variables, read the comment that tells you where the master copy is, or read documentation at its start; you have to unpack your artifact first. A zip archive packed Python utility is less of a shell script and more of a compiled binary.

(It also seems likely that packing our Python utility programs up in zip files would irritate my co-workers more than just throwing all the code into one plain-text file. Code in a single file is just potentially messy and less clear (and I can try to mitigate that); a zip archive is literally unreadable as is no matter what I do.)


Comments on this page:

PEX, the default cli packaging solution in my organization, seems like a reasonable compromise but again from my experience it adds unacceptable latency in execution, and ends up making using CLI tools really annoying.

Btw, Go CLI executables are opaque as much as zip/pex, but libraries like codegansta/cli make them pretty documentation friendly.

vim and emacs both open zip files automagically by default on modern systems

At Google we had a nifty thing called "par files." With the release of Bazel some more utilities have leaked and one of them is thing that makes par files:

https://github.com/google/subpar

Obviously this still has your startup time issue. However you do end up with the all your dependencies (as long as there are none that use modules written in C sadly - the internal one handled that) wrapped up in a single file.

I haven't tried the external one so far, but it did make modularising code much less frustrating.

Written on 14 December 2017.
« Our Apache file serving problem on our general purpose web server
How we automate acmetool »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Dec 14 17:45:09 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.