My view of the difference between 'pets' and 'cattle'

March 2, 2015

A few weeks ago I wrote about how all of our important machines are pets. When I did that I did not strongly define how I view the difference between pets and cattle, partly because I thought it was obvious. Subsequent commentary in various places showed me that I was wrong about this, so now I'm going to nail things down.

To me the core distinction is not in whether you hand-build machines or have them automatically configured. Obviously when you have a large herd of cattle you cannot hand-build them, but equally obviously the current best practice is to use automated setups even for one-off machines and in small environments. Instead the real distinction is how much you care about each individual machine. In the cattle approach, any individual machine is more or less expendable. Does it have problems? Your default answer is to shoot it and start a new one (which your build automation and scaling systems should make easy). In the pet approach each individual machine is precious; if it has problems you attempt to nurse it back to health, just as you would with a loved pet, and building a new one is only a last resort even if your automation means that you can do this rapidly.

If you don't have build automation and so on, replacing any machine is a time consuming thing so you wind up with pets by default. But even if you do have fast automated builds, you can still have pets due to things like them having local state of some sort. Sure, you have backups and so on of that state, but you go to hand care because restoring a machine to full service is slower than a plain rebuild to get the software up.

(This view of pets versus cattle is supported by, eg, the discussion here. The author of that email clearly sees the distinction not in how machines are created but in significant part in how machines with problems are treated. If machines are expendable, you have cattle.)

It's my feeling that there are any number of situations where you will naturally wind up with a pet model unless you're operating at a very big scale, but that's another entry.

Written on 02 March 2015.
« Sometimes why we have singleton machines is that failover is hard
The latest xterm versions mangle $SHELL in annoying ways »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Mar 2 00:09:17 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.