Wandering Thoughts archives

2021-09-21

Why we care about being able to (efficiently) reproduce machines

One of the broad meta-goals of system administration over the past few decades has been working to be able to reliably reproduce machines, ideally efficiently. It wasn't always this way (why is outside the scope of this entry), but conditions have changed enough so that this became a concern for increasingly many people. As a result, it's a somewhat quiet driver of any number of things in modern (Unix) system administration.

There are a number of reasons why you would want to reproduce a machine. The current machine could have failed (whether it's hardware or virtual) so you need to reconstruct it. You might want to duplicate the machine to support more load or add more redundancy, or to do testing or experimentation. Sometimes you might be reproducing a variant of the machine, such as to upgrade the version of the base Linux or other Unix it uses. The more you can reproduce your machines, the more flexibility you can have with all of these, as well as the more confidence you can have that you understand your machine and what went into it.

One way of reproducing a machine is to take careful notes of everything you ever do to the machine, from the initial installation onward. Then, when you want to reproduce the machine, you just repeat everything you ever did to it. However, this suffers from the same problem as replaying a database's log on startup in order to restore its state; replaying all changes isn't very efficient, and it gets less efficient as time goes on and more changes build up.

(You may also find that some of the resources you used are no longer available or have changed their location or the like.)

The goal of being able to efficiently reproduce machines has led system administration to a number of technologies. One obvious broad area is systems where you express the machine's desired end state and the system makes whatever changes are necessary to get there. If you need to reproduce the machine, you can immediately jump from the initial state to your current final one without having to go through every intermediate step.

(The virtual machine approach where VMs are immutable once created can be seen as another take on this. By forbidding post-creation changes, you fix and limit how much work you may need to "replay".)

There are two important and interrelated ways of making reproducing a machine easier (and often more efficient). The first is to decide to allow some degree of difference between the original version and the reproduction; you might decide that you don't need exactly the same versions of every package or to have every old kernel. The second is to systematically work out what you care about on the machine and then only exactly reproduce that, allowing other aspects of the machine to vary within some acceptable range.

(In practice you'll eventually need to do the second because you're almost certain to need to change the machine in some ways, such as to apply security updates to packages that are relevant to you. And when you "reproduce" the machine using a new version of the base Unix, you very much need to know what you really care about on the machine.)

sysadmin/ReproducibleMachinesWhy written at 23:47:00; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.