The downside of automation versus the death of system administration
February 18, 2012
Back in AutomationDownside I discussed how one downside of automation was that either you had to spend time learning all of the extra layers it introduced or you'd become a push-button monkey. There's a consequence of this that I didn't mention back in the entry.
This push-button monkey status is the silent downside of the future death of system administration that I sketched out recently. All of those sysadmin-less developers doing their own deployments from canned recipes aren't going to know what's really going on in all of the layers if something goes wrong. This is fine as long as everything works, but when things go off the rails, well, you have issues.
(This is not just an issue of plain lack of knowledge, either, or to put it another way the lack of knowledge is a feature. One point of this is to save the developers from having to spend the time to learn all of the specialized knowledge that's needed to understand the full stack.)
I wouldn't count on this to save your regular sysadmin job, though. If this future comes to pass, things are going to work most of the time and most of the time when they don't work the developers are going to be able to figure it out on their own fast enough (even if it's not as fast as a sysadmin would). Many fewer places are going to be big enough that things are going wrong so frequently that a full-time 'sysadmin' who understands the full deployment stack makes sense. Especially in the constrained environment of a small company, people will make do and if things blow up every so often that's okay as long as they don't blow up too badly.
(You might question the idea that canned automation will work right most of the time, but I think that it will in specific environments such as deploying to a given cloud setup. And to a large extent the degree that my sketch of the death of system administration comes to pass depends on how routinely reliable such pre-written recipes are.)
Traditional sysadmins will probably be horrified at the mistakes that will result from people not knowing all of the fine details and charging ahead anyways. But on a pragmatic level most of the resulting problems won't and don't matter very much over the long run (although they'll be awkward and embarrassing at the time, just as they are today for the companies that run into them). Especially in a future where automation mostly works, you'll need a real long tail event to seriously damage an otherwise sound company.
(Perhaps people should care more about the possibility of long tail events. But it's a hard argument to make, especially when a company is having to choose between a sysadmin to alleviate a rare risk and another developer to accelerate their growth.)
(I have more thoughts on this area circling in my head, but trying to write some of them down has made it clear that they're not clear yet.)
Sidebar: clarifying what I mean by the push-button monkey stuff
Taken from a comment on AutomationDownside:
This is exactly the situation where you've been reduced to a push-button monkey. You don't actually understand what's going on; you just know how to achieve certain results. What turns people into push-button monkeys isn't that they don't know what to do, it's that they don't know enough about how things really work to do anything other than push the buttons. In particular, they don't know enough to troubleshoot problems except by rote.
Suppose you put a new version of the configuration into the magic host/path spot but the change you wanted isn't appearing on the web servers (or isn't appearing on some web servers). Unless you understand the automation that distributes the files, you don't know where to start looking for problems or even what problems there might be.
(Well, you might have a troubleshooting checklist that someone has prepared for you. But if it's a problem that hasn't been foreseen, you are once again up the creek.)
* * *
Atom feeds are available; see the bottom of most pages.