The problems of operations and sysadmin heroism

March 22, 2012

Back in DevOps and the blame problem I noted that operations has a problem getting praised, because people generally feel that the computers should just work. This leads to what I call the heroism problem for ops.

In practice, ops can easily get praised in exactly one situation: when it's clear to everyone that something exceptional is going on, that it is not just business as usual. In short, you get praised if you (visibly) fix a panic situation, and the more exceptional your efforts to fix the panic situation the better. Put together a solution with chewing gum and bailing wire after staying up all night? All the better.

Everyone can probably see the perverse incentives that this creates. If you are rewarded for cleaning up after floods but not recognized for building flood prevention instead, pretty soon you start losing enthusiasm for trying to argue your bosses into funding that flood prevention. And in a real way this is a lot like the devops blame problem; when you reward some things and penalize others, you have told operations what your priorities are whether you like it or not.

But it gets worse, because here's the thing: this heroism is often attractive. Not just attractive because you're rewarded for it; intrinsically attractive. Heroism means that you get to make a difference in a challenging situation, one that stretches you and calls on all of your ingenuity and cleverness. It is troubleshooting writ large. We can all see that opportunities for heroism are the seeds of great stories, not stories of disasters but stories of triumphs against the odds. Who doesn't want to be part of that?

(This is especially the case if your routine job is not challenging, exciting, or even very interesting.)

Heroism is also corrosive in the long term. It is directly corrosive to lives; it is a young person's game. It is corrosive to engagement. If you have constant opportunities for heroism, people will burn out because very few people can be adrenalized all of the time; if you mostly don't have opportunities for heroism yet heroism is the only really rewarding thing about the job, people are going to check out. And, I think, it is corrosive to your ops culture. When heroism is the rewarding thing, you are implicitly creating a group of troubleshooters instead of anything else (it's certainly what you're encouraging people to get good at). Troubleshooters are certainly useful, but a well rounded ops environment needs more than that.

Written on 22 March 2012.
« My view of where the Unix community is
Sometimes you get lucky »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Mar 22 00:52:57 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.