Why I wind up writing real parsers for my sysadmin tools

September 21, 2008

There is a common habit in sysadmin tools of using ad-hoc methods to extract information out of the less than immediately helpful output of the vendor's programs. Bang together some sed, some awk, some grep, and so on, and you can quickly get what you need, generally in something that you can still understand once the dust settles.

I do this for some tools, in some situations. But increasingly I am writing real parsers for things with complicated output. The problem is that an ad-hoc optimistic parser that just recognizes simple things and grabs output is too dangerous, because it makes an optimistic assumption: it assumes that anything it doesn't specifically recognize and pick out is unimportant.

When I am parsing complex output for really important things, I do not want to make this assumption. I want it to be the other way around; instead of assuming that anything I did not specifically code for is harmless and can be ignored, I assume that anything I do not recognize is dangerous and means that the parser should abort. At a minimum, the presence of unrecognized things means that I did not understand the output of what I'm parsing as well as I thought I did.

(I should note that this doesn't make my programs any better; in fact, it sometimes makes them worse, as they die on harmless things. But it makes me more confidant about what they're doing. Sysadmin tools definitely need to adhere to the 'first, do no harm' precept.)

As a consequence, all of my serious sysadmin tools lately have been written in Python. While it's not impossible to write real parsers in sed, awk, and so on, it's too painful and too much work to make me interested.

(Yes, people have done amazingly impressive things in awk and sed, but I'm lazy. Plus, I have more confidence in my ability to test Python code.)


Comments on this page:

From 192.35.79.70 at 2008-09-22 10:41:58:

What is your favorite strategy for writing parsers in Python?

   -- John L. Clark
By cks at 2008-09-22 11:37:22:

So far I have been writing ad-hoc more or less recursive descent parsers with very simple hand tokenization (for most of what I parse, split() is basically all that's necessary). I haven't looked very much at any of the parsing packages for Python because the hassle of using them seemed greater than the hassle of doing it by hand.

(The degree of hassle is larger for system tools, where I want the program to be self-contained and especially not require installing non-standard packages.)

Written on 21 September 2008.
« A side note to the attraction of file-based blog engines
Some thoughts on improving current thread-based programming »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Sep 21 22:50:23 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.