Automatedly overwriting changed files is not a feature

June 22, 2013

A commentator on an earlier entry wrote in (small) part, about the advantages of automated configuration management systems:

2) Enforced Consistency and Change Management: With every box picking stuff up from chef on a scheduled basis, changes to important functions are automatically set back to what they should be, rather than someones fiddle or tweak. [...]

I've seen this view expressed in any number of places, to the point where it seems to be common wisdom in some sections of the sysadmin world. I think it is making a terrible mistake.

If people are modifying local files on individual machines, what you have is a failure of process. Something has gone wrong. This should not be happening. Cheerfully eradicating those changed files does two things; it covers up the evidence of a process failure and it probably breaks parts of your environment.

(After all, we should assume that people actually had a reason for making the changes they did to a local file and they wanted (or needed) the results of those changes. If you have people randomly editing configuration files on random machines for fun, you have even bigger problems.)

It's my belief that automated configuration management should not be silently covering up the evidence of a process failure, for all of the obvious reasons. Silently overwriting local changes with the canonical master version sounds good in theory but should not be the default behavior in practice. It's better to warn when a local change is detected, although that takes more work.

(Another way to have this happen is for some other program or automated system on a local machine to be fiddling around with the file. One frequent offender is package updates.)

Sidebar: on not shooting the sysadmin

At this point it's popular to blame the person who made the local change (and to say that overwriting their change will serve to teach them not to do that). This is a failure too. People are rational, so that sysadmin was doing something that they thought was either the right thing or at least necessary despite it being wrong. You should treat this as a serious process failure because it demonstrates that somehow this sysadmin wound up with an incorrect picture of your local environment.

By the way, one of the ways that people wind up with incorrect pictures of the local system environment is that the local system environment is too complex for mere fallible humans to actually keep track of. This gives you fragile complexity.

(In this specific case, one thing to do is to have a label in all of your configuration files mentioning where the master version of the file is located. Then people at least have something that will remind them, possibly in a high stress emergency situation, about how to do things the right way.)


Comments on this page:

From 24.165.55.105 at 2013-06-22 04:58:59:

I'm the original commenter, so just a quick response from my perspective:

"If people are modifying local files on individual machines, what you have is a failure of process. Something has gone wrong. This should not be happening. Cheerfully eradicating those changed files does two things; it covers up the evidence of a process failure and it probably breaks parts of your environment."

Absolutely it's a failure of process if someone is changing files on a server, for any number of reasons.

Everyone doing things on the server should know that it's handled by a config management tool. If they don't, then should they even be in a position to make changes? The config file should have a banner at the top (if you're following standard practice) warning you that it's generated by the software and that changes made won't stick. It's not always the case, but that's a question of self/department discipline.

It's my belief that automated configuration management should not be silently covering up the evidence of a process failure, for all of the obvious reasons. Silently overwriting local changes with the canonical master version sounds good in theory but should not be the default behavior in practice. It's better to warn when a local change is detected, although that takes more work.

I can't speak for other tools, but Chef has some Report Handlers built in that can alert you to a number of conditions, including things like files changing, so it's pretty straight forward to make sure it doesn't happen silently:

http://docs.opscode.com/essentials_handlers.html

Joshua Timberman, one of Opscode's employees, wrote a ruby gem specifically to focus down on the changes: http://jtimberman.housepub.org/blog/2011/04/24/a-simple-report-handler/

"People are rational, so that sysadmin was doing something that they thought was either the right thing or at least necessary despite it being wrong"

Do you think a rational person goes to a server knowing it's managed by a config management tool, and then changes a managed file by hand whilst ignoring the banner at the head of the file that tells them their changes will be replaced? If so, then your idea of a rational person and mine are miles apart :)

Yes there may be a very good reason for making the change they want to make, but if there is then it's surely good enough to commit and do properly within the appropriate work-flow. It's not like the work flow is particularly onerous either. It takes three commands to change the file properly with chef:

$EDITOR foo.erb
knife cookbook upload $bar

then on the server in question if you're wanting to speed up the roll out:

chef-client

In that process you really should have a "git commit" and "git push" too for everyone else's benefit, and/or to tie in to your change management process as appropriate.

From 99.236.92.95 at 2013-06-22 11:30:24:

"Do you think a rational person goes to a server knowing it's managed by a config management tool, and then changes a managed file by hand whilst ignoring the banner at the head of the file that tells them their changes will be replaced? If so, then your idea of a rational person and mine are miles apart :)"

Not every file managed by cfengine in our environment is labelled as such. Yes, they should be, but they aren't.

Also, that person might be me, and the change that person might need might be a host-based firewall exception, and that person might have root on the box but not on the cfengine server, and that person might want that change now (Friday afternoon), not Monday mid-morning. So that person might edit iptables, find it blown away less than an hour later, re-do the change, and then do something like

chattr +i /etc/sysconfig/iptables

and refuse to feel guilty about the resulting emails the unix admins got all weekend. :)

Not that I speak from experience or anything.

Yeah, that's still a process failure, but that is the counter example, I think.

Still, that was better than the other guy on HIS managed system who didn't know it was managed by cfengine; once he realised his iptables changes were being blown away by something, he wrote a cron job to replace his changes, and set it to run every minute.

Mike

From 50.43.24.254 at 2013-06-22 11:49:45:

In our environment, we typically chmod 444 files 'owned' by configuration management.

From 87.79.78.105 at 2013-06-22 21:34:22:

Do you think a rational person goes to a server knowing it's managed by a config management tool, and then changes a managed file by hand whilst ignoring the banner at the head of the file that tells them their changes will be replaced? If so, then your idea of a rational person and mine are miles apart :)

But if that person does not have any considered reason (even if it’s not a good reason) to make that change, then – well then let me quote you back at yourself:

If they don't, then should they even be in a position to make changes?

Your overall position is self-consistent only under the assumption that whoever could make such changes will never have a considered reason to make them, and therefore never would make them.

In other words, it requires buying into an ideology. Ideologies aren’t bad; even proselytising for them isn’t bad; but buying into one to such an extent that you become unaware that it isn’t universal – that is problematic.

Aristotle Pagaltzis

From 129.10.115.55 at 2013-06-24 11:40:20:

The very first time I ever heard about puppet from someone, they explained how puppet would enforce changes to files by correcting them to a known good state.

I said, "wow, that's awesome! How does it alert you that the file changed?"

They looked at me like I had a second head, and said, "why would you want that? It just brings it into compliance"

I said, "because if it was out of compliance, I want to know", and they said, "oh, I don't know anyone using it like that".

Granted, this was in 2009ish? But still...how is that not an obvious use case off the bat? It's like electric fence, except way better.

--Matt Simmons

Written on 22 June 2013.
« A Django application design puzzle about modularity
'Human error' is not a root cause of problems »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sat Jun 22 01:58:46 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.