The big trick of running lots of systems
July 4, 2005
There's a lot of rules of running large-scale systems, with lots of machines. I'll probably be writing up my own version of them at some point. But they all really come down to one big trick:
That's it. Everything else is in the implementation details. (Of course, the devil is always in the details.)
But what does it mean? More or less what it says: you should never deal with machines one by one, ideally not even if one of them is exploding. Dealing with machines one by one is somewhat like trying to get through a swamp on foot; you can make progress, but oh so very slowly, and slogging through the mud is very tiring.
This deep principle underlies a lot of large scale system
administration tools, including things like LDAP, NIS, and
automounters. (Which are just ways of making it so that you don't have
to worry about
(Like the best big tricks this is in some ways a very Zen thing, so it's hard to find much to say about it that doesn't feel like belaboring the obvious.)
* * *
Atom feeds are available; see the bottom of most pages.