2022-07-30
Python is my default choice for scripts that process text
Every so often I wind up writing something that needs to do something more complicated than can be readily handled in some Bourne shell, awk, or other basic Unix scripting tools. When this happens, the language I most often turn to is Python, and especially Python is my default choice when the work I'm doing involves processing text in some way (or often if I need to generate text). For example, if I want to analyze the output of some command and generate Prometheus metrics from it, Python is often my choice. These days, this is Python 3, even with its warts with handling non-Unicode input (which usually don't come up in this context).
(A what a lot of these programs do could be summarized as string processing with logic.)
In theory there's no obvious reason that my language of choice couldn't be, say, Go. But in practice, Python has much less friction than something like Go while still having enough structure and capabilities to be better than a much more limited tool like awk. One part of this is Python's casualness about typing, especially typing in dicts. In Python, you can shove anything you want into a dict and it's completely routine to have dicts with heterogenous values (usually your keys are homogenous, eg all strings). This might be madness in a large program, but for small, quickly written things it's a great speedup.
(Some of the need for this can be lessened with dataclasses or attrs
. And Python lets you scale up from
basic dicts to those, or to basic classes used as little more than
records, as you decide they make your code simpler.)
Another area where Python reduces friction is in the lack of explicit
error handling while still not hiding errors; exceptions insure
that while you may not deal with errors well, you will deal with
them one way or another. Again
this isn't necessarily what you want in a bigger, more structured
program, but in the small it's quite handy to not have to ornament
every 'int(...)
' or whatever with some sort of error check.
In general, Python is (surprisingly) good at pulling strings apart,
shuffling them around, and putting them back together, while still
staying structured enough to let me follow what the code does even
when I come back to it later. Compact, low ceremony inline string
formatting is often quite useful (I use '%
' because I'm old
fashioned).
Python certainly isn't the only language that can be used in this way; Perl and Ruby are two other obvious examples, and more modern people would probably reach for Javascript. But Python is the one that I've wound up latching on to and sticking with.
I do find it a bit amusing and ironic that despite all of the issues in Python 3 with Unicode and IO (and my gripes surrounding that), it's what I normally use for processing text. In theory, I risk explosions; in practice, it works because I'm in a UTF-8 capable environment with well formed input (often just plain ASCII, which is the most common case for log files and command output).