A multilevel view of DevOps (with more balance)

September 28, 2011

Phil Hollenback in a comment on my earlier entry:

My take is that DevOps Means Don't Be An A-Hole. That's kind of a compliment to what you are saying I think - devops is about improving communications.

I don't think it's this simple. Instead I think that the broad label of 'DevOps' is a reaction to three different sets of organizational problems, which I am going to list in decreasing order of severity.

First is the blame problem, where developers get blamed for not delivering features and operations gets blamed for services not being available. If you have the blame problem it trumps everything else, because the interests of developers and operations are fundamentally opposed to each other. All of the communication in the world cannot fix that.

Next is problems of ignorance, where development and operations don't understand each other's environments, problems, and constraints. It's stereotypical for sysadmins (me included) to see this as primarily a development problem where developers have ignored issues like installation, logging, performance, and operational reliability, but I'm sure that there's things about development that operations doesn't get. Fixing this requires education and possibly making development and operations care enough about each other's problems to get that education.

(See Tedd Dziuba for some ways to make developers care.)

The final set of problems are cultural ones, where the two groups are assholes to each other mostly because that's how they've always behaved and it's just how operations and development deal with each other. If this is your only 'DevOps' problem, you can indeed fix it with just better communication and more respect.

(The quibble here is that sometimes cultural issues have roots in things like the amount of respect and pay that goes to various groups, because people are people.)

Each level of organization problem creates the levels of problems below it (unless you're really lucky and have a staff of selfless, motivated saints). An organization with the blame problem is going to have cultural problems and almost certainly ignorance problems; an organization with ignorance issues is going to grow cultural ones. It follows that you need to fix problems from the top down in order to make lasting changes.

(Yes, this is a more balanced view of DevOps than my first entry on it. Sometimes I run a little bit too hard with an idea.)


Comments on this page:

From 86.26.110.69 at 2011-09-30 19:02:52:

Sorry to bang on but you keep mentioning only two groups here, Dev and Ops

There is a third group which ideally should be affiliated with neither of the other two.

QA/Test.

And I write this as a committed Ops person who has unwittingly become involved in blame games for too long on too many critical systems.

In almost all cases, neither Dev or Ops have been directly responsible nor entirely culpable, the root cause commonly being inadequate or non-existent testing.

I always say in any failure mode, 'why wasn't that caught in testing', and if it's not feasible to test for it, it's too complex and should be re-architected.

From 72.231.223.88 at 2011-10-01 09:32:12:

I can't help but perceive insistence on the need for a separate QA team as just another variation of the "blame game."

Really, testing is as much the responsibility of the developer as the operational considerations.

In my organization, we are approaching DevOps (without calling it that) by spreading the responsibilities, expertise, and access across the three traditional groups. While individuals may have their areas of expertise, every team has all the skillsets, and is judged as a unit. If the team's product fails, they all fail.

This is an approach to solving the blame game by forging small, tight teams that won't point the fingers at each other, but will instead work together to succeed by any means necessary. Small groups will tend to self-correct by sharing knowledge and splitting tasks based on strengths. To make this work the team needs to be healthy and effective, which is easier said than done.

Written on 28 September 2011.
« Oracle shows its appreciation for long-term Sun customers again
Unit testing by analogy to scientific hypotheses »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Sep 28 11:37:40 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.