Wandering Thoughts archives

2006-07-25

An awk idiom: getting fields backwards from the end of the line

It's easy in awk to get fields counting from the start of a line; the first field is $1, the second field is $2, and so on. But periodically I'm not interested in fields at the start of the line, I'm interested in a field at the end of the line; it's much easier to see that it's the third-last field than to carefully count how many fields it is from the start.

(And sometimes you have a variable number of fields in a line but you know that what you want is always Nth field from the end, in which case counting up from the front doesn't help at all. One common case is when a logical field can sometimes have whitespace, so awk will turn it into a variable number of fields.)

Fortunately awk has a way out: '$' is actually an operator (a very high precedence one), so it takes expressions as well as just numbers, and the NF variable is the number of fields in the current line.

Because I keep forgetting this: the last field in the line is '$(NF)', and '$(NF-1)' is the second last field. (Because awk counts fields from 1 instead of from 0, unlike Python and Perl.)

(Okay, technically you can use the $[ magical variable to make Perl 1-based, or in fact arbitrarily based. Don't.)

programming/AwkLastFieldIdiom written at 16:21:29;

Reading Unix manpages

One of the important skills for Unix programming is the ability to parse manpages carefully. This is not as easy as it looks, because manpages are traditionally written in a style where everything is important and you have to think carefully about the implications of the exact wording used.

(This can be useful for other things than Unix manpages, since a lot of specifications are written in the same style.)

For example, today I was emailed a comment on my Python socket module irritation entry pointing out the existence of the .makefile() method function, which:

Return a file object associated with the socket. [...] The file object references a dup()ped version of the socket file descriptor, so the file object and socket object may be closed or garbage-collected independently.

Thinking about how I would use this, one of the things I found myself wondering about was what would happen if you dup()ped a socket file descriptor and called shutdown() on only one of the file descriptors. (Bearing in mind that you have to close() all of the file descriptors for a socket before the socket goes away.)

So I consulted the manpage. The Linux shutdown(2) manpage contains the following description (emphasis mine):

The shutdown call causes all or part of a full-duplex connection on the socket associated with fd to be shut down.

(Similar wording appears in the Solaris and FreeBSD manual pages.)

Once I put on my spec reading hat, it was clear that saying 'the socket associated with fd' instead of something like 'the file descriptor fd' was important. Thus shutdown(2) is not like close() and has an immediate effect when called, no matter how many times the file descriptor has been dup()ped.

(And some quick Python later, I had confirmed this.)

programming/ReadingManpages written at 01:02:55;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.