2022-05-24
Some notes on providing Python code as a command line argument
I've long known about CPython's '-c
' argument, which (in the words
of the manual page) lets you "specify the command [for CPython] to
execute". Until recently, I thought it had to be a single statement,
or at least a single line of Python code (which precluded a number
of things). It turns out that this isn't the case; both CPython and
PyPy will accept a command line argument for -c that contains
embedded newlines, in the style of providing command line code to
Unix tools like awk.
For example:
python -c 'import sys if len(sys.argv) > 1: print("arguments:", sys.argv[1:]) else: print("no arguments")' "$@"
(For various reasons, you still might want to make this code importable, although I haven't done so here.)
If you're directly supplying the code on the command line, as I am here, you have a choice (in a Bourne shell script or environment). You can quote the entire code with single quotes and not use a literal single quote in the Python code, or you can quote with double quotes and carefully escape several special characters but get to use single quotes. If you want to avoid all of this, you need to put the code into a shell variable:
pyprog="$(cat <<'EOF' [....] EOF )" python -c "$pyprog" ...
As you'd expect, '__name__
' in the command line code is the
usual '__main__
'. As the manual page covers, all further command
line arguments as passed in sys.argv, with sys.argv[0] set to '-c'.
Since the code doesn't have a file name (which is what would normally
go in sys.argv[0]), this seems like a decent choice, and immediately
passing further arguments to the code is convenient.
Although this makes it possible to have a Python program embedded into a shell script in the same way that you can do this with awk (and thus implicitly helps enable Python as a filter in a shell script), I personally don't find the idea too appealing, at least for Python code of any substance. The problem isn't the need to take extra care with embedding the Python code in your shell script, although that's not great. The real problem is that embedding Python code this way means you miss out on all sorts of tools that are in the Python programming ecology, because they only work on separate Python code.
(If I had to write something this way, I would be tempted to develop it in a separate file that the shell script invoked with 'python <filename>' instead of 'python -c', and then only embed the code into the shell script and switch to 'python -c ...' at the last moment.)
PS: Now that I know how to do this it's a little bit tempting to
try out small amounts of Python code in places where awk doesn't
quite have the functions and power I'd like (or at least doesn't
make the functions as easy as Python does). On the other hand, awk
doesn't make you think about character set conversion issues.
Probably I wouldn't use this to parse and reformat smartctl
's
JSON, though. That's likely to
be enough code that I'd want to use the usual Python tools on it.
2022-05-16
Python programs as wrappers versus filters of other Unix programs
Sometimes I wind up in a situation, such as using smartctl's JSON output, where I want to use a Python program to process and transform the output from another Unix command. In a situation like this, there are two ways of structuring things. I can have the Python program run the other command as a subprocess, capture its output, and process it, or I can have a surrounding script run the other command and pipe its output to the Python program, with the Python program acting as a (Unix) filter. I've written programs in both approaches depending on the situation.
Which sort of begs the question, namely what sort of situation makes me choose one option or the other? One reason for choosing the wrapper approach is the ease of copying the result places; a Python wrapper is only one self-contained thing to copy around to our systems, while a shell script that runs a Python filter is at least two things (and then the shell script has to know where to find the Python program). And in general, a Python wrapper program makes the whole thing feel like there are fewer moving parts (that it runs another Unix command as the program's starting point is sort of an implementation detail that people don't have to think about).
(The self contained nature of wrappers pushes me toward wrappers for things that I expect to copy to systems only on an 'as needed' basis, instead of having them installed as part of system setup or the like.)
One reason I reach for the filter approach is if I have a certain amount of logic that's most easily expressed in a shell script, for example selecting what disks to report SMART data on and then iterating over them. Shell scripts make expanding file name glob patterns very easy; Python requires more work for this. I have to admit that how the idea evolved also plays a role; if I started out thinking I had a simple job of reformatting output that could be done entirely in a shell script, I'm most likely to write the Python as a filter that drops into it, rather than throw the shell script away and write a Python wrapper. Things that start out clearly complex from the start are more likely to be a Python wrapper instead of a filter used by a shell script.
(The corollary of this is if I'm running the other command once with more or less constant arguments, I'm much more likely to write a wrapper program instead of a filter.)
I believe that there are (third party) Python packages that are intended to make it easy to write shell script like things in Python (and I think I was even pointed at one once, although I can't find the reference now). In theory I could use these and native Python facilities to write more Python programs as wrappers; in practice, I'm probably going to take the path of least resistance and continue to do a variety of things as shell scripts with Python programs as filters.
I don't know if writing this entry is going to get me to be more systematic and conscious about making this choice between a wrapper and a filter, but I can hope so.
PS: Another aspect of the choice is that it feels easier (and better known) to adjust the settings of a shell script by changing commented environment variables at the top of the script than making the equivalent changes to global variables in the Python program. I suspect that this is mostly a cultural issue; if we were more into Python, it would probably feel completely natural to us to do this to Python programs (and we'd have lots of experience with it).