Wandering Thoughts archives

2020-09-16

How I think I want to drop modern Python packages into a single program

For reasons beyond the scope of this blog entry, I'm considering augmenting our Python program to log email attachment information for Exim to use oletools to peer inside MS Office files for indications of bad things. Oletools is not packaged by Ubuntu as far as I can see, and in any case it would be an older version, so we would need to add the oletools Python packages ourselves.

The official oletools install instructions talk about using either pip or setup.py. As a general rule, we're very strongly against installing anything system-wide except through Ubuntu's own package management system, and the environment our Python program runs in doesn't really have a home directory to use pip's --user option, so the obvious and simple pip invocations are out. I've used a setup.py approach to install a large Python package into a specific directory hierarchy in the past (Django), and it was a big pain, so I'd like not to do it again.

(Nor do we want to learn about how to build and maintain Python virtual environments, and then convert how we run this Python program to use one.)

After some looking at pip's help output I found the 'pip install --target <directory>' option and tested it a bit. This appears to do more or less what I want, in that it installs oletools and all of its dependencies into the target directory. The target directory is also littered with various metadata, so we probably don't want to make it where the program's normal source code lives. This means we'll need to arrange to run the program so that $PYTHONPATH is set to the target directory, but that's a solvable problem.

(This 'pip install' invocation does write some additional pip metadata to your $HOME. Fortunately it actually does respect the value of the $HOME environment variable, so I can point that at a junk directory and then delete it afterward. Or I can make $HOME point to my target directory so everything is in one place.)

All of this is not quite as neat and simple as dropping an oletools directory tree in the program's directory, in the way that I could deal with needing the rarfile module, but then again oletools has a bunch of dependencies and pip handles them all for me. I could manually copy them all into place, but that would actually create a sufficiently cluttered program directory that I prefer a separate directory even if it needs a $PYTHONPATH step.

(Some people will say that setting $PYTHONPATH means that I should go all the way to a virtual environment, but that would be a lot more to learn and it would be more opaque. But looking into this a bit did lead to me learning that Python 3 now has standard support for virtual environments.)

python/PipDropInInstall written at 23:54:46; Add Comment

Why I write recursive descent parsers (despite their issues)

Today I read Laurence Tratt's Which Parsing Approach? (via), which has a decent overview of how parsing computer languages (including little domain specific languages) is not quite the well solved problem we'd like it to be. As part of the article, Tratt discusses how recursive descent parsers have a number of issues in practice and recommends using other things, such as a LR parser generator.

I have a long standing interest in parsing, I'm reasonably well aware of the annoyances of recursive descent parsers (although some of the issues Tratt raised hadn't occurred to me before now), and I've been exposed to parser generators like Yacc. Despite that, my normal approach to parsing any new little language for real is to write a recursive descent parser in whatever language I'm using, and Tratt's article is not going to change that. My choice here is for entirely pragmatic reasons, because to me recursive descent parsers generally have two significant advantages over all other real parsers.

The first advantage is that almost always, a recursive descent parser is the only or at least easiest form of parser you can readily create using only the language's standard library and tooling. In particular, parsing LR, LALR, and similar formal grammars generally requires you to find, select, and install a parser generator tool (or more rarely, an additional package). Very few languages ship their standard environment with a parser generator (or a lexer, which is often required in some form by the parser).

(The closest I know of is C on Unix, where you will almost always find some version of lex and yacc. Not entirely coincidentally, I've used lex and yacc to write a parser in C, although a long time ago.)

By contrast, a recursive descent parser is just code in the language. You can obviously write that in any language, and you can build a little lexer to go along with it that's custom fitted to your particular recursive descent parser and your language's needs. This also leads to the second significant advantage, which is that if you write a recursive descent parser, you don't need to learn a new language, the language of the parser generator, and also learn how to hook that new language to the language of your program, and then debug the result. Your entire recursive descent parser (and your entire lexer) are written in one language, the language you're already working in.

If I was routinely working in a language that had a well respected de facto standard parser generator and lexer, and regularly building parsers for little languages for my programs, it would probably be worth mastering these tools. The time and effort required to do so would be more than paid back in the end, and I would probably have a higher quality grammar too (Tratt points out how recursive descent parsers hide ambiguity, for example). But in practice I bounce back and forth between two languages right now (Go and Python, neither of which have such a standard parser ecology), and I don't need to write even a half-baked parser all that often. So writing another recursive descent parser using my standard process for this has been the easiest way to do it every time I needed one.

(I've developed a standard process for writing recursive descent parsers that makes the whole thing pretty mechanical, but that's a discussion for another entry or really a series of them.)

PS: I can't comment about how easy it is to generate good error messages in modern parser generators, because I haven't used any of them. My experience with my own recursive descent parsers is that it's generally straightforward to get decent error messages for the style of languages that I create, and usually simple to tweak the result to give clearer errors in some specific situations (eg, also).

programming/WhyRDParsersForMe written at 00:33:22; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.