2013-06-22
Automatedly overwriting changed files is not a feature
A commentator on an earlier entry wrote in (small) part, about the advantages of automated configuration management systems:
2) Enforced Consistency and Change Management: With every box picking stuff up from chef on a scheduled basis, changes to important functions are automatically set back to what they should be, rather than someones fiddle or tweak. [...]
I've seen this view expressed in any number of places, to the point where it seems to be common wisdom in some sections of the sysadmin world. I think it is making a terrible mistake.
If people are modifying local files on individual machines, what you have is a failure of process. Something has gone wrong. This should not be happening. Cheerfully eradicating those changed files does two things; it covers up the evidence of a process failure and it probably breaks parts of your environment.
(After all, we should assume that people actually had a reason for making the changes they did to a local file and they wanted (or needed) the results of those changes. If you have people randomly editing configuration files on random machines for fun, you have even bigger problems.)
It's my belief that automated configuration management should not be silently covering up the evidence of a process failure, for all of the obvious reasons. Silently overwriting local changes with the canonical master version sounds good in theory but should not be the default behavior in practice. It's better to warn when a local change is detected, although that takes more work.
(Another way to have this happen is for some other program or automated system on a local machine to be fiddling around with the file. One frequent offender is package updates.)
Sidebar: on not shooting the sysadmin
At this point it's popular to blame the person who made the local change (and to say that overwriting their change will serve to teach them not to do that). This is a failure too. People are rational, so that sysadmin was doing something that they thought was either the right thing or at least necessary despite it being wrong. You should treat this as a serious process failure because it demonstrates that somehow this sysadmin wound up with an incorrect picture of your local environment.
By the way, one of the ways that people wind up with incorrect pictures of the local system environment is that the local system environment is too complex for mere fallible humans to actually keep track of. This gives you fragile complexity.
(In this specific case, one thing to do is to have a label in all of your configuration files mentioning where the master version of the file is located. Then people at least have something that will remind them, possibly in a high stress emergency situation, about how to do things the right way.)