2023-04-30
Os.walk, the temptation of hammers, and the paralysis of choice
I have a shell script to give me a hierarchical, du-like report of memory usage broken down by Linux cgroup. Even back when I wrote it, it really needed to be something other than a shell script, and a recent addition made it quite clear that the time had come (the shell script version is now both slow and inflexible). So as is my habit, I opened up a 'memdu.py' in my editor and started typing. Some initial functions were easy, until I got to here:
def walkcgroup(top): for dirpath, dirnames, filenames in os.walk(top, topdown=True): if memstatfile not in filenames: dirnames[:] = [] continue
Then I stopped typing because I realized I had a pile of choices of make about exactly how this program was going to be structured, and maybe I didn't want to use os.walk(), as shown by the very first thing I wrote inside the for loop.
The reason I started writing code with os.walk() is because it's the obvious hammer to use when you want to walk all over a directory tree, such as /sys/fs/cgroup. But on the other hand, I realized that I'm not just visiting each directory; I'm in theory constructing a hierarchical tree of memory usage information. What os.walk() gives you is basically a linear walk, so if you want a tree reconstructing it is up to you. It's also more awkward to cut off walking down the tree if various conditions are met (or not met), especially if one of the conditions is 'my memory usage is the same as my parent's memory usage'. If what I want is really a tree, then I should probably walk the directory hierarchy myself (and pass each step its parent node, already loaded with memory information, and so on).
On the third hand, the actual way this information will be printed out is as a (sorted) linear list, so if I build a tree I'll have to linearize it later. Using os.walk() linearizes it for me in advance, and I can then readily sort it into some order. I do need to know certain information about parents, but I could put that in a dict that maps (dir)paths to their corresponding data object (since I'm walking top down I know that the parent will always be visited before the children).
A lot of these choices come down to what will be more convenient to code up, and these choices exist at all because of the hammer of os.walk(). Given the hammer, I saw the problem as a nail even though maybe it's a screw, and now I've realized I can't see what I have. Probably the only way to do so is to write one or the other version of the code and see how it goes. Why haven't I done that, and instead set aside the whole of memdu.py? That's because I don't want to 'waste time' by writing the 'wrong' version, which is irrational. But here I am.
(Having written this entry I've probably talked myself into going ahead with the os.walk version. If the code starts feeling awkward, much of what I've built will probably be reusable for a tree version.)
PS: This isn't the first time I've been blinded by a Python feature.