How we do and document machine builds
I've written before about our general Ubuntu install system and I've mentioned before that we have documented build procedures but we don't really automate them. But I've never discussed how we do reproducible builds and so on. Basically we do them by hand, but we do them systematically.
Our Ubuntu login and compute servers are essentially entirely built through our standard install system. For everything else, the first step is a base install with the same system. As part of this base install we make some initial choices, like what sort of NFS mounts this machine will have (all of them, only our central administrative filesystem, etc).
After the base install we have a set of documented additional steps; almost all of these steps are either installing additional packages or copying configuration files from that central filesystem. We try to make these steps basically cut and paste, often with the literal commands to run interlaced with an explanation of what they do. An example is:
* install our Dovecot config files: cd /etc/dovecot/conf.d/ rsync -a /cs/site/machines/aviary/etc/dovecot/conf.d/*.conf .
Typically we do all of this over a SSH connection, so we are literally cutting and pasting from the setup documentation to the machine.
(In theory we have a system for automatically installing additional Ubuntu packages only on specific systems. In practice there are all sorts of reasons that this has wound up relatively disused; for example it's tied to the hostname of what's being installed and we often install new versions of a machine under a different hostname. Since machines rarely have that many additional packages installed, we've moved away from preconfigured packages in favour of explicitly saying 'install these packages'.)
We aren't neurotic about doing everything with cut and paste; sometimes it's easier to describe an edit to do to a configuration file in prose rather than to try to write commands to do it automatically (especially since those are usually not simple). There can also be steps like 'recover the DHCP files from backups or copy them from the machine you're migrating from', which require a bit of hand attention and decisions based on the specific situation you're in.
(This setup documentation is also a good place to discuss general issues with the machine, even if it's not strictly build instructions.)
When we build non-Ubuntu machines the build instructions usually follow a very similar form: we start with 'do a standard base install of <OS>' and then we document the specific customizations for the machine or type of machine; this is what we do for our OpenBSD firewalls and our CentOS based iSCSI backends. Setup of our OmniOS fileservers is sufficiently complicated and picky that a bunch of it is delegated to a couple of scripts. There's still a fair number of by-hand commands, though.
In theory we could turn any continuous run of cut and paste commands into a shell script; for most machines this would probably cover at least 90% of the install. Despite what I've written in the past, doing so would have various modest advantages; for example, it would make sure that we would never skip a step by accident. I don't have a simple reason for why we don't do it except 'it's never seemed like that much of an issue', given that we build and rebuild this sort of machine very infrequently (generally we build them once every Ubuntu version or every other Ubuntu version, as our servers generally haven't failed).
(I think part of the issue is that it would be a lot of work to get a completely hands-off install for a number of machines, per my old entry on this. Many machines have one or two little bits that aren't just running cut & paste commands, which means that a simple script can't cover all of the install.)