2008-10-13
The complexity of not lying to Makefiles
So, suppose that you are tired of lying to your Makefiles and want to fix the problem (in the abstract; you can't
completely fix your Makefiles, because it's not fixable in anything like
ordinary make). What does this take?
What you really need to capture is the state of the 'build inputs'. The build inputs are whatever goes in to creating the output, which is a lot more than just the source files involved; it also includes things like the test of the execution rule, the Makefile variables that go in to it, and even the programs that are run to create things.
(Actually, you don't care about the programs themselves so much as their
'interfaces' (in the programming language sense). You don't need to
rebuild something just because the version of cp changed, because the
old and the new version still have the same interface and thus will
produce the same results.)
Capturing the state of build inputs is, in a sense, the easy bit; there is a lot of ways to capture and summarize the state of things. The difficult bit is figuring out just what is a build input. As my entry illustrates, you cannot rely on people to do this; it is too much work, especially if you are trying to really be thorough and truly solve the problem. Unfortunately, I don't think that anyone really knows how to do automated build analysis to capture this information, which leaves one somewhat out in the cold on the whole problem.
(My feeling is that automated analysis is only really practical in a captive build environment, but captive build environments rarely last for very long. Sooner or later they grow the ability to run external programs, and that's it for your ability to completely understand all build rules.)
2008-10-12
An irritating awk limitation: getting a range of fields
Writing things in awk has a number of little irritations that come up
every so often. One of them is that it has no built in way to retrieve
a range of input fields as a string, ie there is no equivalent of what
in Python one could write as something like 'r = " ".join(input[2:])'
(which turns everything from the third field onwards back into a single
string).
Of course, you can do this with an awk function. But it's irritating to
have to keep including that function in my awk programs (especially
when they are tiny programs that are written inline in a larger shell
script), and it points out a deeper weakness in awk, which is that
awk has no really good way to manipulate how lines are split into
fields.
Take the example from yesterday and
consider the sed invocation, which only exists because of this awk
issue. What we really want to do is split each line into two fields:
the first word of the line, and then everything else; then we will print
the second field and ignore the first one. However, you can't do this in
awk (or at least not very easily).
(To beat people to the obvious approach: yes, you can assign an empty
string to $1 and then use $0, but that puts a space at the front of
the new $0, which is sometimes important.)
Sidebar: the necessary awk function
Here's the necessary awk function:
function fieldstr(s, e, i, r) {
if (e > NF) e = NF
r = $(s)
for (i = s+1; i <= e; i++)
r = r " " $(i)
return r
}
With no error checking, you can make this a sensible one-liner function
body without being too offensive. (In my version I put the 'return r'
on a second line because otherwise it looked too crammed in.)