The temptation of writing shell scripts, illustrated

April 23, 2022

It's an article of faith in many quarters that you shouldn't write anything much as a shell script and should instead use a proper programming language. I generally agree with this in theory, but recently I went through a great experience of why this doesn't necessarily work out for me in practice, as I wrote (and then rewrote) a shell script that really should be a program in, say, Python.

A systemd-based Linux system can be set to track how much memory is being used by each logged in user and by system services, and we configure many of our machines to do so. Systemd calls this MemoryAccounting and it's actually implemented using Linux cgroup memory accounting (there's also the cgroups v1 memory accounting). Because it's implemented with cgroups, the actual memory usage is visible under /sys/fs/cgroup and you can read it out directly by looking at various files. Recently we had an incident where I wound up wanting a convenient way to get a nice view of per-user and per-service memory usage, and it occurred to me that I could present this in the style of nice, easily readable disk space usage:

  43.7G  /
  39.5G  /u
  35.6G  /u/<someone>
   4.4G  /system
   1.9G  /system/auditd
 601.9M  /u/cks
 277.3M  /system/cron
 259.0M  /system/systemd-journald
  89.3M  /init
[...]

This started out looking very easy. All I had to do was read some files and reported the contents. Well, and humanize the raw number of bytes to make things more readable. And transform the names of the directories where the files were, to give things like '/u/cks' instead of 'user.slice/user-NNN.slice' (which requires looking up the login name for Unix uids). And skip things with no memory usage. And handle both cgroup v1 (used on most of our machines) and cgroup v2 (now used on Ubuntu 22.04). And maybe descend several levels deep into the hierarchy to get interesting details; for instance, users may have multiple sessions with widely differing memory usage. And if we're going to descend several levels deep, perhaps we should skip lower levels that have the same usage as their parent.

However, I didn't start out realizing all of these needs and nice things right away. I started out with something very simple that could just give a few easy to get numbers for user.slice, system.slice, and the root of the hierarchy. Extending that to system slices and user slices looked simple, and it wasn't too hard to transform UIDs into login names with the id command, and so on and so forth as one issue after another surfaced, including deciding how to look deeper into bits of the hierarchy. Taken one by one, almost every issue looked simple to solve on its own in a shell script, but the end result of putting all of these together is a shell script that almost certainly would have been easier to write in a programming language.

My rewrite came when I realized that I could turn the problem of looking through the hierarchy inside out, by using find to walk the entire cgroup hierarchy looking for anything that had memory usage. This gave me a whole new set of fun name transformation problems, and also showed another problem of shell scripts, which is that the result is now too slow because it has to keep invoking sed and other things on tons of names. But once again, each step toward the end result looked simple and approachable as just another bit of shell or sed mangling.

Meanwhile, starting out what felt like a simple thing in Python, Go, or any of the other alternatives looked like a much bigger investment of my effort. Python, Go and so on need more structure and often don't have quite as simple and convenient methods of doing various shell like things. The problem wasn't obviously too big for a shell script when I started writing the first bits of what I've named 'memdu', so I didn't want to go all the way to a Python program, and anyway shell scripts are more 'lightweight' than Python, never mind Go. And so I slid into the temptation of shell scripts, where every individual step looks easy enough but at the end, I probably would have been better off starting out in something else.

(Hopefully I will take all of this as a learning experience and motivate myself to rewrite the script in Python. But on the other hand, the resulting shell script is working and I'm lazy.)

Written on 23 April 2022.
« The state of Python (both 2 and 3) in Ubuntu 22.04 LTS
Some things that make shell scripts have performance issues »

Page tools: View Source.
Search:
Login: Password:

Last modified: Sat Apr 23 22:32:44 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.