Wandering Thoughts archives

2012-03-28

How I (once) did change management with scripts

When I read Philip Hollenback's latest entry and it mentioned someone doing (change/system) management through shell scripts (instead of, say, Puppet), my first thought was 'hey, I've done that'. So I might as well write up how I did it, either for someone to use or in case people want to marvel at the crazy person.

(Now, a disclaimer: by now this was more than half a decade ago, and some of my memories of the fine details have undoubtedly faded (ie, are now wrong).)

The basic environment this happened in was a lab environment with (at its height) on the order of a hundred essentially identical PC machines running Linux (this is the same environment where we needed efficient update distribution). Most of the system management was handled through packages and automatic package updates, but every so often there was something that was best handled in a shell script.

Each separate change was a separate little shell script, all of which lived in a common directory (actually one directory for each OS release). Script filenames started with a sequence number (eg they had names like '01-fix-something'), and scripts were run in sequence. The driver system kept track of which scripts had already succeeded and did not re-run them; a script that exited with a failed status would be retried the next time the driver system ran. The driver system ran once a day or (I believe) immediately after system boot, and processed scripts after applying package updates. Scripts were expected to check if they were applicable before doing anything and exit if they weren't (with status 0 if they were definitely not applicable to this system or with status 1 if they should be retried the next time).

(If I was doing this again I think I would make the driver script not run further scripts if an earlier one failed. In our case all of the scripts were basically independent, so it didn't matter.)

There was no mechanism to rerun a script if it changed; if I changed a script and wanted to have it rerun, I needed to give it a new sequence number. If a script became unnecessary for some reason, it was just removed.

All of this is actually quite short and simple to implement, and it worked quite well within its modest goals. It was not particularly difficult to write scripts, they were automatically executed for you, all machines were kept in sync, and a newly (re)installed machine would automatically pick up all of the current customizations. These days, you would put the entire directory of scripts into a VCS (and you might distribute it by having the workstations check out a copy from the central repo).

sysadmin/MyScriptChangeManagement written at 23:08:46; Add Comment

Ultimately, abuse issues have to be handled by humans

Time and time again, people have tried to create entirely automated systems for detecting, identifying, and dealing with spam on their services. Time and time again, they've ultimately failed; their systems may stop a great deal of spam, but enough gets through despite it.

(Not infrequently the spam that gets through looks, from the outside, as if it should be trivial to recognize. I think there is a deep reason for this, which we'll get to.)

There is a shallow and a deep reason for this failing. The shallow reason is that humans (and spammers are humans) will relentlessly game any set of automated rules until they can find weaknesses and then drive as many trucks as possible through whatever weaknesses they've found. If your service is at all popular, there will be far more smart spammers trying to game the automation than there are smart people writing the automation, placing your automation writers in an arms race they almost certainly cannot possibly win. The deep reason is that you are guaranteed to have weaknesses, because it's essentially impossible to make automated rules as smart as they need to be due to the fundamental problem of spam of stopping bad content while letting good content through. Whatever 'bad' and 'good' are, which is one reason you need people.

(As for why spam that gets through automated systems often looks obvious to people, it's because there's no reason for spammers to add variety once they've gotten past the automated systems. In fact they can be blindingly obvious so long as they evade the automation.)

All of this means that places really do need humans to handle their abuse issues; automation can help by getting obvious things, but it will never entirely replace humans paying attention. The corollary is that places need not just some people but enough people for the volume of abuse they get. This is an extremely unpopular view since abuse is a cost center and everyone loves the idea of automating your cost centers to make them go away, but by this point we have plenty of experience that this just doesn't work for abuse.

(The corollary is that anyone who relies on automation instead of staffing up their abuse department to adequate levels is not actually serious about spam, regardless of what they say. They may not be actively for spam and spammers on their service, but to use the fine George Orwell phrase they are objectively pro-spam. Application to various Silicon Valley firms are left as an exercise for the reader.)

spam/HumanAbuseHandling written at 00:52:53; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.