A downside of automation

February 15, 2012

Right now in the sysadmin world it probably qualifies as heresy to say bad things about the idea of automating your work. But unfortunately for us, there actually are downsides to doing so even if we don't notice them a lot of the time.

The one I'm going to talk about today is that when you automate something, you increase the number of things that people in your team need to know. Suppose that you get tired of maintaining your Apache configuration files by hand, so now you put them in a Chef configuration. You've gone from a situation where all you need to know to configure your Apache is Apache configuration itself to a situation where now you need to know Apache configuration, using Chef, and how you're using Chef to configure your Apache. Any time you automate you go from just needing to know one thing, the underlying thing you're dealing with, to needing to know three or so; you still need to know the underlying thing, but now you also need to know the automation system in general and how you're using it in specific.

(You can condense this by one layer of knowledge if you're not using a general automation system, because then the last two bits condense to one. But you probably don't want to do that.)

This can of course be compounded on itself further. Are you auto-generating DHCP configurations from an asset database and then distributing them through Puppet? Well, you've got a lot of layers to know about.

Some people will say that you don't need to really know all of these layers (especially once you reach the level of auto-generated things and other multi-layer constructs). The drawback of this is that not knowing all of the layers turns you into a push-button monkey; you don't actually understand your system any more, you can just push buttons to get results as long as everything works (or doesn't go too badly wrong).

All of this suggests a way to decide when automation is going to be worth it: just compare the amount of time that it'll take for people to learn the automation system and how you're using it with how much time they would spend doing things by hand. You can also compare more elaborate automation systems to less elaborate ones this way (and new elaborate 'best practices' systems to the simple ones you already have).

(One advantage of using a well known automation system such as Chef or Puppet is that you can hope to hire people who already know the automation system in general, cutting out one of the levels of learning. This is also a downside of having your own elaborate automation system; you are guaranteed that new people will have to learn it.)

By the way (and as is traditional), the people who designed and built the automation system are in a terrible position to judge how complex it is and how hard it is to learn, or even to see this issue. You're usually not going to see the system as complex or hard to keep track of, because to you it isn't; as the builder, you're just too close to the system and too immersed in it to see it from an outside perspective.

PS: Automation can have other benefits in any particular situation that are strong enough to overcome this disadvantage (including freeing sysadmins from drudgery that will burn them out). But it's always something to remember.

(This is closely related to the cost of automation but is not quite the same thing; in that entry I was mostly talking about locally developed automation instead of using standard automation tools.)


Comments on this page:

From 195.26.247.141 at 2012-02-15 04:25:01:

I'm one of those who likes to auto-generate the DHCP and DNS databases from Puppet configs, with several layers in between for the generation and collection of this data.

Whilst this is more complex, it is fantastically flexible and you only have to learn the concepts once. Any additional uses of these generated resources will be in a similar enough fashion that you don't need to re-learn it.

I consider this akin to learning a new programming language which has a decent chunk more expressive power to it, so that you can write at a higher level. Understanding the lower layers becomes less important as the language and libraries become more standardised (as you eluded to in your "what will kill system administration" post).

Sysadmins need to get used to learning about automation, as it is most definitely becoming a very important skill to have -- similar growth has already happened with virtualization.

From 67.7.117.247 at 2012-02-15 08:38:25:

Being able to have a push button monkey do the work is a very nice feature of automation though. For instance we have an Apache server that for reasons I won't go into has a very complex setup with a large number of proxies and redirects. In order to add a new redirect several files have to be touched and ordering in those files matters, and there are multiple Apache processes running only one of which needs to be restarted. Some of the admins have a very good understanding of how the system functions, but a lot of the admins don't have any reason to work on the system regularly.

We use Puppet to automate all of this so from the admin prospective all adding a redirect entails is editing a file and adding four lines. This is both nicer for me on a regular basis and allows the other admins who don't often touch the system to treat it as mostly a black box. They do need to understand a little bit of Puppet, but they don't have to understand the complexities of the underlying system. They don't even have to understand that it is Apache underneath.

From 184.71.238.42 at 2012-02-15 11:32:36:

The more you abstract, the further you get from the metal, the larger and more complex you can build your systems. If you lose sight of what really happens underneath, well you'll be in a far worse position than before you had abstraction.

I've seen too many times where some magical enterprise app, built in house, totally grinds the production systems and networks into the ground because everyone along the path forgot about how the underlying pieces actually work and you really do need to have a good understanding of the hardware, os, network at lower levels to effectively use it without knocking the whole thing over.

http://phasorburn.com/

From 198.189.14.2 at 2012-02-15 11:33:17:

Reminds me of RFC 1925 item 6a: It is always possible to add another level of indirection.

-- Edward Berner

From 69.158.15.67 at 2012-02-15 18:52:32:

All of this suggests a way to decide when automation is going to be worth it: just compare the amount of time that it'll take for people to learn the automation system and how you're using it with how much time they would spend doing things by hand.

The above makes it sounds like automation is an either-or scenario: either one decides to do automation for everything (including app config files) or one doesn't do it at all.

In reality each site needs to decide where to draw automation the automation line (a fact that that I'm sure Chris knows, but I'm simply stating to make it explicit for posterity's sake). So while some places may in fact automate their Apache config (e.g., large web companies like Google, Facebook, etc.), smaller places may simply decide on a "webserver" class in Chef/Puppet and ensure that the proper packages are installed and stops there.

Automation could still be used to make sure things like syslog, NTP, resolver, Kerberos, workstation SMTP mail server, and NSS/LDAP configuration files are properly set up. So some applications which are cookie cutter for a large swath of machines (like NTP) could be managed via automation, but one-off system may not be.

Going back to the Apache example: while it may not a good idea to code up the Apache config as a (Chef) recipe, it may be useful to have the file in a central place and have Chef distribute it to the web server/s.

This way the config is in a central location (hopefully under version control) with all the other config files. If you ever need to clone the machine (perhaps to test new software) it's easy to create another instance since everything should (mostly?) be in one place. If your web servers are redundant, it's also easier to push out the config from one place instead of remembering to edit the file on two machines (perhaps making typos that cause them to de-sync in behavior).

So going back to this line:

You've gone from a situation where all you need to know to configure your Apache is Apache configuration itself to a situation where now you need to know Apache configuration, using Chef, and how you're using Chef to configure your Apache.

Or you could be in a situation where all you need to know to configure Apache is Apache configuration, but just do it at this particular host/path, and the changes will be pushed out to the web server/s in question. (And also remember to check-in the changes into the VCS with a meaningful comment. :)

By cks at 2012-02-21 15:07:49:

Belated response time and a marker for other people: I put a reply to this issue as a sidebar in AutomationDownsideII and the conversation is continuing there.

Written on 15 February 2012.
« The problem with long-term production support of things
The temptation of LVM mirroring »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Wed Feb 15 01:17:14 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.