The temptation of writing shell scripts, illustrated
It's an article of faith in many quarters that you shouldn't write anything much as a shell script and should instead use a proper programming language. I generally agree with this in theory, but recently I went through a great experience of why this doesn't necessarily work out for me in practice, as I wrote (and then rewrote) a shell script that really should be a program in, say, Python.
A systemd-based Linux system can be set to track how much memory is being used by each logged in user and by system services, and we configure many of our machines to do so. Systemd calls this MemoryAccounting and it's actually implemented using Linux cgroup memory accounting (there's also the cgroups v1 memory accounting). Because it's implemented with cgroups, the actual memory usage is visible under /sys/fs/cgroup and you can read it out directly by looking at various files. Recently we had an incident where I wound up wanting a convenient way to get a nice view of per-user and per-service memory usage, and it occurred to me that I could present this in the style of nice, easily readable disk space usage:
43.7G / 39.5G /u 35.6G /u/<someone> 4.4G /system 1.9G /system/auditd 601.9M /u/cks 277.3M /system/cron 259.0M /system/systemd-journald 89.3M /init [...]
This started out looking very easy. All I had to do was read some files and reported the contents. Well, and humanize the raw number of bytes to make things more readable. And transform the names of the directories where the files were, to give things like '/u/cks' instead of 'user.slice/user-NNN.slice' (which requires looking up the login name for Unix uids). And skip things with no memory usage. And handle both cgroup v1 (used on most of our machines) and cgroup v2 (now used on Ubuntu 22.04). And maybe descend several levels deep into the hierarchy to get interesting details; for instance, users may have multiple sessions with widely differing memory usage. And if we're going to descend several levels deep, perhaps we should skip lower levels that have the same usage as their parent.
However, I didn't start out realizing all of these needs and nice
things right away. I started out with something very simple that
could just give a few easy to get numbers for user.slice, system.slice,
and the root of the hierarchy. Extending that to system slices and
user slices looked simple, and it wasn't too hard to transform UIDs
into login names with the
id command, and so on and so forth as
one issue after another surfaced, including deciding how to look
deeper into bits of the hierarchy. Taken one by one, almost every
issue looked simple to solve on its own in a shell script, but the
end result of putting all of these together is a shell script that
almost certainly would have been easier to write in a programming
My rewrite came when I realized that I could turn the problem of
looking through the hierarchy inside out, by using
find to walk
the entire cgroup hierarchy looking for anything that had memory
usage. This gave me a whole new set of fun name transformation
problems, and also showed another problem of shell scripts, which
is that the result is now too slow because it has to keep invoking
sed and other things on tons of names. But once again, each step
toward the end result looked simple and approachable as just another
bit of shell or sed mangling.
Meanwhile, starting out what felt like a simple thing in Python,
Go, or any of the other alternatives looked like a much bigger
investment of my effort. Python, Go and so on need more structure
and often don't have quite as simple and convenient methods of doing
various shell like things. The problem wasn't obviously too big for
a shell script when I started writing the first bits of what I've
memdu', so I didn't want to go all the way to a Python
program, and anyway shell scripts are more 'lightweight' than Python,
never mind Go. And so I slid into the temptation of shell scripts,
where every individual step looks easy enough but at the end, I
probably would have been better off starting out in something else.
(Hopefully I will take all of this as a learning experience and motivate myself to rewrite the script in Python. But on the other hand, the resulting shell script is working and I'm lazy.)
Comments on this page:Written on 23 April 2022.