You probably don't want to use Make to build your generated files

December 17, 2013

When I started working here, one of the early things I worked on was generating files of information for various things (such as getting files of all valid local email addresses). Partly because I was very clever back then and partly because we were doing this on ancient, underpowered Solaris machines, I did this the obviously efficient way: I used a Makefile to control the whole process and made sure that it did only what it absolutely needed to. If everything was up to date the whole process did nothing.

(In the process I found a Solaris make bug, which perhaps should have been a warning sign.)

Since then I have come around to the view that this is almost always being too clever. There are two problems. The first is that it's very hard for your Makefile to be completely accurate and when inaccuracy sneaks in, files don't get (re)built as they should. This is very frustrating and leads to the other issue, which is that sometimes you want to force a rebuild no matter what. For example, perhaps you think that the output file has gotten corrupt somehow and you want to replace it with the current version. You can sort of handle this with Make, of course; you can provide a 'make clean' target and so on and so forth. But all of this is extra work for you to create a better Makefile and for everyone when they use this Make-based system (and it's probably still going to go wrong every so often).

The truth is that a Makefile-based system is almost always optimizing something that doesn't matter on modern systems. Unless the generation process is very expensive for some reason, you're not going to notice doing all of it every time and therefor you're not saving anything worthwhile by only doing part of it. It's much easier to rip out the Makefile and replace it with a simple script that always generates everything from scratch every time. At most, optimize the final update of the live versions so that you skip doing anything if the newly generated files are identical to the existing files.

My repeated experience is that the result is simpler, easier to follow, and easier to do things with. As a sysadmin you have the reassuring knowledge that if you run the script, new versions of the files will get generated using the current procedure and (if necessary) pushed to the final destinations. You don't have to try to remember what magic bits might need to be poked so that this really happens, because it always happens.

The exception when this is not being too clever is when the full generation process is so expensive and time consuming that it is worthwhile (or even utterly necessary) to optimize it as much as possible. Even then you might want to consider ways of speeding it up in general before you start taking bits of it out most of the time.

(This is where people replace shell scripts with Perl or Python or even Go and at a higher level try to see if there's some better way to get and process the information that the system is operating on. Note that, as always, it's almost always better to optimize the algorithms before you optimize the code.)

Written on 17 December 2013.
« My computers are increasingly sort of Internet terminals
Thinking about what we'll need for reproducible OmniOS installs »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Dec 17 00:15:58 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.