awk idiom: getting fields backwards from the end of the line
It's easy in
awk to get fields counting from the start of a line;
the first field is
$1, the second field is
$2, and so on. But
periodically I'm not interested in fields at the start of the line, I'm
interested in a field at the end of the line; it's much easier to see
that it's the third-last field than to carefully count how many fields
it is from the start.
(And sometimes you have a variable number of fields in a line but you know that what you want is always Nth field from the end, in which case counting up from the front doesn't help at all. One common case is when a logical field can sometimes have whitespace, so awk will turn it into a variable number of fields.)
awk has a way out: '
$' is actually an operator (a very
high precedence one), so it takes expressions as well as just numbers,
NF variable is the number of fields in the current line.
Because I keep forgetting this: the last field in the line is '
$(NF-1)' is the second last field. (Because
awk counts fields
from 1 instead of from 0, unlike Python and Perl.)
(Okay, technically you can use the
$[ magical variable to make Perl
1-based, or in fact arbitrarily based. Don't.)
Reading Unix manpages
One of the important skills for Unix programming is the ability to parse manpages carefully. This is not as easy as it looks, because manpages are traditionally written in a style where everything is important and you have to think carefully about the implications of the exact wording used.
(This can be useful for other things than Unix manpages, since a lot of specifications are written in the same style.)
For example, today I was emailed a comment on my Python
socket module irritation entry
pointing out the existence of the
.makefile() method function,
Return a file object associated with the socket. [...] The file object references a
dup()ped version of the socket file descriptor, so the file object and socket object may be closed or garbage-collected independently.
Thinking about how I would use this, one of the things I found myself
wondering about was what would happen if you
dup()ped a socket
file descriptor and called
shutdown() on only one of the file
descriptors. (Bearing in mind that you have to
close() all of the
file descriptors for a socket before the socket goes away.)
So I consulted the manpage. The Linux
shutdown(2) manpage contains the
following description (emphasis mine):
The shutdown call causes all or part of a full-duplex connection on the socket associated with fd to be shut down.
(Similar wording appears in the Solaris and FreeBSD manual pages.)
Once I put on my spec reading hat, it was clear that saying 'the socket
associated with fd' instead of something like 'the file descriptor
fd' was important. Thus
shutdown(2) is not like
close() and has
an immediate effect when called, no matter how many times the file
descriptor has been
(And some quick Python later, I had confirmed this.)