Wandering Thoughts archives

2008-10-12

An irritating awk limitation: getting a range of fields

Writing things in awk has a number of little irritations that come up every so often. One of them is that it has no built in way to retrieve a range of input fields as a string, ie there is no equivalent of what in Python one could write as something like 'r = " ".join(input[2:])' (which turns everything from the third field onwards back into a single string).

Of course, you can do this with an awk function. But it's irritating to have to keep including that function in my awk programs (especially when they are tiny programs that are written inline in a larger shell script), and it points out a deeper weakness in awk, which is that awk has no really good way to manipulate how lines are split into fields.

Take the example from yesterday and consider the sed invocation, which only exists because of this awk issue. What we really want to do is split each line into two fields: the first word of the line, and then everything else; then we will print the second field and ignore the first one. However, you can't do this in awk (or at least not very easily).

(To beat people to the obvious approach: yes, you can assign an empty string to $1 and then use $0, but that puts a space at the front of the new $0, which is sometimes important.)

Sidebar: the necessary awk function

Here's the necessary awk function:

function fieldstr(s, e,   i, r) {
    if (e > NF) e = NF
    r = $(s)
    for (i = s+1; i <= e; i++)
        r = r " " $(i)
    return r
}

With no error checking, you can make this a sensible one-liner function body without being too offensive. (In my version I put the 'return r' on a second line because otherwise it looked too crammed in.)

programming/AwkFieldAccessLimitation written at 02:19:36; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.