2006-04-29
Another little script: field
In the footsteps of the first little script, here's
something I call field:
#!/bin/sh
fn=""
for i in "$@"; do
nf='$'$i
if [ -z "$fn" ]; then
fn="$nf"
else
fn="$fn, $nf"
fi
done
exec awk "{print $fn}"
(The exec is another way around the Linux bash issue.)
You give it one or more field numbers, whereupon it reads standard input
and prints just those fields to standard output. I wrote it because I
got tired of typing 'awk "{print $7}"' and the like all the time.
Given field and howmany, we can now write
what I'll call countup:
#!/bin/sh field "$@" | howmany
Typical usage is things like 'countup 1 </var/log/web-xfers | sed 10q'
to show me the top 10 IPs for today's web requests. (Today the big
source is our internal Google search appliance. Although the third
most active source was 200.55.156.97, poking us for security holes.
Badly, which is the most annoying thing about it.)
(An ongoing index for all of my little scripts is here.)
Sidebar: an irritation
The irritating bit about field is all of the work it has to go through
to generate the print instructions for awk. It feels like there should be
a nice short Perl equivalent, but the closest I've come is the not quite
correct:
#!/usr/bin/perl -w
use strict;
my @l;
while (<STDIN>) {
chomp; @l = split;
print join(" ", map {$l[$_]} @ARGV), "\n";
}
If you give this a field that doesn't exist in (some or all) records, you get complaints about 'use of uninitialized variable in ...'.
(There are more complicated constructs that will stop this, but I am interested in something compact, no larger than the Bourne shell script, and my Perl is sufficiently rusty that I can't see one right now.)
2006-04-23
The sort of command line I can wind up typing
Here's the sort of command line I can wind up typing:
spam/wtd daemon | expsyslog.pl | tcpwrhits -i | sed 10q | while (read foo) { n=`{echo $foo}; echo '| '^$n '|' `{checksmtp $n(2)}; }
(This will probably be wrapped in your browser, but it is all one line.)
I wrote that on the fly, in one pass, although I had to think it through a bit. I'm not going to claim that this is a typical Unix command line, but I do think it's the sort of thing experienced Unix users wind up doing every so often. It's also a good illustration of the density of little custom scripts in real environments.
What it produces is more or less the 'count / IP address / why' table from this week's spam summary, in DWikiText form ready to be dumped into place.
(The shell syntax will look a bit strange, since I use rc.)
Sidebar: an index of commands
spam/wtd |
Spits out the last week's worth of some sort of log,
in this case the daemon syslogs. |
expsyslog.pl |
expand 'last message repeated N times' in syslog logs |
tcpwrhits |
Generate a report of IP addresses rejected at connection time by our SMTP frontend (and how many times each was rejected) |
checksmtp |
Tell me what our SMTP frontend would do for a connection from a given IP address. |
(Except for expsyslog.pl, all of these are completely specific to
our environment and thus pretty uninteresting.)
2006-04-19
A gotcha with analyzing syslog logs
Consider the following command:
; grep 'gabba gabba hey' /var/log/messages | wc -l
This looks like a perfectly sane and sensible way of doing syslog log
analysis: fish out a pattern, then count or otherwise crunch it up. It's
just the sort of thing Unix tools are built for, and I do it all the
time. (My howmany program often gets used here.)
Unfortunately, as I was reminded recently, this appealingly simple approach is not quite correct. It can undercount things, because syslogd likes to save space and time by condensing repeated messages into things like:
Apr 16 03:49:34 gpu last message repeated 26 times
That's a convenient space savings when people are pounding on your mail server, but it does make producing accurate counts in log summaries a bit more complicated than it looks. In fact it was this very line that poked me about the whole issue, after I shuffled our SMTP filtering so that the CBL would be checked first. Once I saw it skimming the raw logs, it became clear to me that I wasn't going to get usefully accurate stats unless I did something.
My approach is crude and brute force: I now have a filter that gets fed logs and de-summarizes syslogd's summarization, expanding 'last message repeated N times' into N more copies of the previous line. The expanded logs can then be fed to other commands, which get to stay straightforward while now being correct. This is not quite accurate (the timestamps will be wrong, but then you can't recover the timestamps anyways) but is close enough for counting purposes.
(If you're just looking for whether or not certain messages show up at all you don't need to worry about this; you are guaranteed at least one copy of each message, even if syslogd then swallows a lot more.)
The Perl script I use for this is just large enough to not fit in this entry. So instead you can find it as expsyslog.pl; see the comments for usage. Typically it's something like:
; expsyslog.pl /var/log/messages | grep 'gabba gabba hey' | wc -l
(Yes, I write Perl scripts every so often. For some jobs you just can't beat its conciseness and convenient command of Unix idioms.)
2006-04-11
A little script: howmany
There's a lot of Unix idioms that I use often enough to have turned
into little shell scripts. Here's the first of them; I call it howmany:
#!/bin/sh trap '' 13 sort "$@" | uniq -c | sort -nr
(The trap is necessary because of a Linux bash issue.)
Howmany does what its name says: it counts up how many times each line of its standard input appears, then shows them from most to least frequent with the count. It usually gets used at the end of pipelines that crunch logs and the like.
2006-04-01
The best April Fools joke I've seen here
My department has always done a lot of different things. Many years ago we were called 'Computing Services' ('University of Toronto Computing Services' if you were being formal), and one of those things was a bunch of modems on a Sun 4. Of course, we had a greeting banner:
U of T Computing Services.
login:
(roughly what you saw if you dialed in.)
One April 1st, a co-worker changed this to 'U of T Confusing
Services'.
The best part of this (as far as I'm concerned, because I like subtle April Fools' jokes) is that hardly anyone noticed; after all, who really reads a login banner that you see routinely?
The Sun 4 is long gone, and so are most of our modems. But one modem remains, and to this day (even after several changes of hardware and operating system, and two departmental renamings) it still greets callers with:
U of T Confusing Services.
login: