2012-12-22
Packaging in compiled versus interpreted languages
As a result of reading this, I wound up thinking about my different feelings about packaging in Go and Python. In the process I realized that it came down to a relatively fundamental difference in what packaging needs in compiled languages versus interpreted languages.
Let me put it this way:
In compiled languages you only need outside packages at build time.
The great problem of packaging in interpreted languages is that your program needs whatever packages it depends on to be around and findable at runtime. This drastically complicates packaging and system design, because you and/or the person you're sending your program to may have no system permissions or not want to install the package system-wide, then on shared machines the program may get run by someone other than the person who set it up. Even if you do have system permissions, there may be multiple package management systems in action.
Or in short, a program in an interpreted language that uses packages is almost certainly not a self-contained entity. This is certainly true of Python, which has created a moderate disincentive to using packages (and in fact all modularity) in my Python programs.
This is not a problem in Go. Go only needs the packages at build time and it's perfectly happy to have you pile them up along side the rest of the source code. After everything is compiled and bound together into a binary you wind up with a single object that you can easily pass around. This is just the same in pretty much any compiled language, of course. In a compiled language you have to go out of your way to not allow 'outside' code packages to be part of the source tree (or general build environment) or to remain as external dependencies after the program has been compiled. Really, this is pretty fundamental to what compiling code and linking a binary does.
(You can create external dependencies with compiled languages; that's what shared libraries are. But generally static libraries and static linking is easier.)
The practical upshot is that Python packaging gives me heartburn, especially as a sysadmin, and Go packaging leaves me indifferent (and may leave me enthused if I ever actually manage to write a Go program).
PS: yes, packages in compiled languages can throw this advantage away. There are any number of libraries and so on for C (and C++) that really insist on being installed as system-wide shared libraries and then being used that way in the build process for programs. There are even a few that can't be built as usable static libraries even if you try hard.
2012-12-19
Part of good awk programming is getting the clause order right
Today I wanted to extract IP addresses from a recording file that had them in a particular format, in repeated stanzas that looked like:
<date> ---HTTP--- <IP> [...] ---SOMETHING--- [possibly other IPs]
I wanted all the HTTP IPs. I'm sure that somewhere there is a convenient multiline grep but since I didn't have one handy, I reached for awk. As it turns out, solving this problem concisely in awk makes a good example of a possibly underappreciated art in awk programming, namely choosing the order of your clauses.
Here's the awk program I came up with:
/^---/ { p = 0; }
p { print $0; }
/^---HTTP/ { p = 1; }
Because awk evaluates clauses in order, you can exploit the ordering of clauses as a deliberate part of program logic (and you can also blow your foot off with the wrong choice of order).
Printing of the current line is controlled by p, a flag variable. If
p is set, the line is printed; if p is unset, we output nothing. We
set p when we see the '---HTTP' leadin, but since we don't want to
print the leadin itself (just the following IP addresses) we make this
the last clause. The IP addresses end with a '---SOMETHING---' line
(but we don't bother to match that much), which causes us to turn off
p; since we don't want that line printed, this clause is the first
one. Turning off p in the first clause when the line is '---HTTP---'
is harmless, because it will get turned on by the final clause. Since
it's harmless we don't need a longer match or more complex conditional
logic (and this means that we don't actually care which other thing
comes after the stanza we care about, so long as there is one).
This also shows the flaw of awk. This program is too clever by half, since it's just indirectly expressing the simple logic:
if (line is '---HTTP---') p = 1 elif (line starts with '---') p = 0 elif (p) print $0
In the name of golfing the awk program a bit I've embedded the elif
logic here in the ordering of my clauses, where it's much harder to see
than if it was written out simply and plainly.
(For even more fun, you can vary the order of the clauses to control whether the start or end markers will be printed. This would be a much more visible change in the clear version of the program.)