Link: Getting Real About Distributed System Reliability

March 25, 2012

Jay Kreps' Getting Real About Distributed System Reliability is a very interesting discussion of the reliability of distributed systems in the real world. He patiently explains that a number of assumptions normally made to reason about this are in fact wrong in practice, especially the assumption that failures are independent. I'm not going to try to summarize his entry beyond that; go read it instead.

(I suspect that his logic extends to all real systems, not just distributed ones, and in any case he has given me a lot to think about.)

By the way, several of the links in his entry are themselves worth following and reading carefully.

(I believe I got this from my Twitter stream but I cannot find the original source now.)

Written on 25 March 2012.
« Garbage collection and modern virtual memory systems
Atom feeds and constrained (web) environments »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Mar 25 01:07:53 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.