The obviousness of inheritance blinded me to the right solution

October 28, 2018

This is a Python programming war story.

I recently wrote a program to generate things to drive low disk space alerts for our ZFS filesystems in our in-progress Prometheus monitoring system. ZFS filesystems are grouped together into ZFS pools, and in our environment it makes sense to alert on low free space in either or both (ZFS filesystems can run out of space without their pool running out of space). Since we have a lot of filesystems and many fewer pools, it also makes sense to be able to set a default filesystem alert level on a per-pool basis (and then perhaps override it for specific filesystems). The actual data that drives Prometheus must be on a per-object basis, so one thing the program has to do is expand those default alert levels out to be specific alerts for every filesystem in the pool without a specific alert level.

When I began coding the Python to parse the configuration file and turn it into a data representation, I started by thinking about the data representation. It seemed intuitively clear and obvious that a ZFS pool and a ZFS filesystem are almost the same thing, except that a ZFS pool has a bit more information, and therefor they should be in a general inheritance relationship with a fundamental base class (written here using attrs):

@attr.s
class AlertObj:
  name = attr.ib()
  level = attr.ib()
  email = attr.ib()

class FSystem(AlertObj):
  pass

@attr.s
class Pool(AlertObj):
  fs_level = attr.ib()

I wrote the code and it worked, but the more code I wrote, the more awkward things felt. As I got further and further in, I wound up adding ispool() methods and calling them here and there, and there was a tangle of things operating on this and that. It all just felt messy. Something was wrong but I couldn't really see what at the time.

For unrelated reasons, we wound up wanting to significantly revise how we drove low disk space alerts and rather than modify my first program, I opted to start over from scratch. One reason for this was because with the benefit of a little bit of distance from my own code, I could see that inheritance was the wrong data model for my situation. The right natural data representation was to have two completely separate sets of objects, one set for directly set alert levels, which lists both pools and filesystems, and one for default alert levels (which only contains pools because they're the only thing that creates default alert levels). The objects all have the same attributes (they only need name, level, and email).

This made the processing logic much simpler. Parsing the configuration file returns both sets of objects, the direct set and the defaultable set. Then we go through the second set and for each pool entry in it, we look up up all of the filesystems in that pool and add them to the first set if they aren't already there. There is no Python inheritance in sight and everything is obviously right and straightforward.

In the new approach, it would also be relatively easy to add default alert levels that are driven by other sorts of things, for instance an idea of who owns a particular entity (pools are often owned collectively by groups, but individual filesystems may be 'owned' and used by specific people, some of whom may not care unless their filesystems are right out of space). The first version's inheritance-based approach would have just fell over in the face of this; a default alert level based on ownership has no 'is-sort-of-a' relationship with ZFS filesystems or pools at all.

I've always known that inheritance wasn't always the right answer, partly because I have the jaundiced C programm's view of object orientation; all of OO's fundamental purpose is to make my code simpler, and if it doesn't do that I don't use it. In theory this should have made me skip inheritance here; in practice, inheritance was such an obvious and shiny hammer that once I saw some of it, I proceeded to hit all of my code with it no matter what.

(If nothing else, the whole experience serves me as a useful learning experience. Maybe the next time around I will more readily listen to the feeling that my code is awkward and maybe something is wrong.)

Written on 28 October 2018.
« What 'dependency' means in Unix init systems is underspecified
Link: HiDPI on dual 4K monitors with Linux »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Oct 28 00:49:22 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.