How we lie to our Makefiles

September 24, 2008

I spent part of today updating the system that generates data files for our mail system to add a new computed data file. This system is built around a Makefile, and so I added a new rule, which looks like this:

$(CL): $(DL) $(AUXFILE)
    crunch-dl $(DL) $(AUXFILE) >$(CL).new
    mv -f $(CL).new $(CL)

(The rule is done in two stages so that we don't update the $(CL) file if the crunch-dl script fails. If we just wrote directly to $(CL), we could be left with an incomplete or empty $(CL) that make thinks is up to date because it has a current timestamp.)

Those dependencies are a lie. Well, a lie by omission. The generated file doesn't just depend on the data files, it also depends on the crunch-dl script that processes the data files; change the script and you may well change the generated output. (You may not, but this is true of changing any dependency.)

I commit this lie regularly, even routinely, although I am not sure why. Partly I think that it is a mindset; I think of Makefile rules in general as declaring the inputs that are turned into the output, and the script is not an 'input' as such so it gets left out. Partly I think that it is convention and habit, how I've always done things and seen them done, and thus that it would simply feel (and look) wrong to add the script as a dependency (especially because it isn't in the current directory).

(Of course, my conventions are mostly formed by reading and writing Makefiles for C programs, which have relatively little call for explicitly depending on the compiler. And once you start thinking about depending on the compiler, you also have to think about depending on important compiler flags, and most people don't want to go there.)


Comments on this page:

From 130.217.250.13 at 2008-09-24 01:32:53:

With Makefiles you need to lie if you want to use things like $^ in your rule. (Unless theres a better way?)

From 71.116.111.180 at 2008-09-25 00:02:00:

Lots of points.

First. I like the nmake that is part of the AT&T AST toolkit. This is going to sound like an advert. If you use the standard rules (they are not "builtin" in the sense that most other "make" programs) then it automatically adds a dependency on the C compiler, or to be more accurate on the value of the CC variable and that depends on the C compiler. Nmake maintains the state of almost everything, including the values of variables, even ones set on the make command line!

Second. To respond to the first comment, with most make programs you can select parts of $^. For example with GNU make you can use one of the functions "words", "filter", or "filter-out". With nmake you can use the :O operator.

Third. It would be better to write the example as 2 rules.

  ${CL}: ${CL}.new
      mv $< $@
  ${CL}.new: crunch-dl ${DL} ${AUXFILE}
      $^ > $@

as this allows a make implementation to do as much as possible in parallel (OK, nothing in this case). In general it is good if a recipe updates exactly $@ and nothing else. It is also good to use makes builtin variables in the recipes as then things like VPATH do not confuse the issue.

However this shows up a different problem, most people do not want to have "." in their PATH, so you want to say ./crunch-dl if crunch-dl is in the current directory, but if crunch-dl is resolved to a file in a different directory, and in particular if it is an absolute filename then you do not want to prefix ./ to dl-crunch. With say GNU make you could do something like

      case $< in /*) $^ > $@ ;; *) ./$^ > $@ ;; esac

but nmake has operators (either :P=A to convert it to an absolute path, or :P=E to conditionally add ./) to handle this problem.

Fourth. An exception to the rule "only update $@" is widely used in builds like that of gcc. The problem it is trying to solve is things like header files that are generated by programs, but if the header is unchanged you do not want all the things that depend on it to be recompiled. To avoid this the rules are written something like this.

  tree.h: tree.time
        true
  tree.time: make_tree_h
        $^ > tmpfile
        cmp -s tmpfile tree.h || cp tmpfile tree.h
        touch $@

If make_tree_h is updated, then tree.time needs to be built. As a side effect tree.h will be updated if it is changed. If make_tree_h is not updated then the recipe associated with tree.time is not run. Usually tree.h will be older than tree.time, so the "true" recipe will be run. As this does not update tree.h this lack of change will propagate up the tree. Of course one could have written

  tree.h: make_tree_h
        $^ > $@.tmp
        cmp -s $@.tmp $@ || cp $@.tmp $@
        rm -f $@.tmp

but if "make_tree_h" takes significant resources to run then it is better to run "true" instead. Most makefiles which use this make it less obvious as they run the command silently so it does not appear in the normal make output.

The nmake advantage here is that you can write the rule in the more natural second way, but nmake will remember that the rule was run and did not update the target file, and will not run it again unless some other reason requires it. Nmake remembers state.

OK, enough advert. The software is available under an opensource license and you can get it from http://public.research.att.com/sw/download/

Lucent have a forked version available with commercial support, and have extensive documentation.

Icarus Sparry.

From 70.18.185.222 at 2008-09-29 19:53:56:

Third. It would be better to write the example as 2 rules.

 ${CL}: ${CL}.new
     mv $< $@
 ${CL}.new: crunch-dl ${DL} ${AUXFILE}
     $^ > $@

That won't work, unless you have a fancy version of make that will delete the target if the build rule fails. Suppose that crunch-dl is prone to failure. A simple way to work with that might be to run make repeatedly until it succeeds.

But see what you've done? If crunch-dl fails the first time, the second time it won't get run because make will think that ${CL}.new is all up-to-date, and it will just move the bad version into place.

Written on 24 September 2008.
« Some thoughts on improving current thread-based programming
Why qmail is no longer a suitable Internet MTA »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Sep 24 00:03:37 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.