Link: Getting Real About Distributed System Reliability
Jay Kreps' Getting Real About Distributed System Reliability is a very interesting discussion of the reliability of distributed systems in the real world. He patiently explains that a number of assumptions normally made to reason about this are in fact wrong in practice, especially the assumption that failures are independent. I'm not going to try to summarize his entry beyond that; go read it instead.
(I suspect that his logic extends to all real systems, not just distributed ones, and in any case he has given me a lot to think about.)
By the way, several of the links in his entry are themselves worth following and reading carefully.
(I believe I got this from my Twitter stream but I cannot find the original source now.)