Why I wind up writing real parsers for my sysadmin tools
There is a common habit in sysadmin tools
of using ad-hoc methods to extract information out of the less than
immediately helpful output of the vendor's programs. Bang together some
sed
, some awk
, some grep
, and so on, and you can quickly get what
you need, generally in something that you can still understand once the
dust settles.
I do this for some tools, in some situations. But increasingly I am writing real parsers for things with complicated output. The problem is that an ad-hoc optimistic parser that just recognizes simple things and grabs output is too dangerous, because it makes an optimistic assumption: it assumes that anything it doesn't specifically recognize and pick out is unimportant.
When I am parsing complex output for really important things, I do not want to make this assumption. I want it to be the other way around; instead of assuming that anything I did not specifically code for is harmless and can be ignored, I assume that anything I do not recognize is dangerous and means that the parser should abort. At a minimum, the presence of unrecognized things means that I did not understand the output of what I'm parsing as well as I thought I did.
(I should note that this doesn't make my programs any better; in fact, it sometimes makes them worse, as they die on harmless things. But it makes me more confidant about what they're doing. Sysadmin tools definitely need to adhere to the 'first, do no harm' precept.)
As a consequence, all of my serious sysadmin tools lately have been
written in Python. While it's not impossible to write real parsers in
sed
, awk
, and so on, it's too painful and too much work to make me
interested.
(Yes, people have done amazingly impressive things in awk
and sed
,
but I'm lazy. Plus, I have more confidence in my ability to test Python
code.)
|
|