You don't necessarily know what matters for performance
One of the obvious lessons I could learn from the switch issue at the heart of our recent disk performance issue is that low level details matter, sometimes a lot. While this is true, I think it's a superficial thing to take away from this learning experience. The lesson I really want to take away is a more general one:
I don't know in advance what matters for performance.
Until I had this experience you probably could have sat me down with the switch specifications for both switches, asked me if there were any differences that mattered for our iSCSI performance, and I would have gone through the specs and confidently told you no. Obviously I would have been wrong.
What this convinces me is that it is in practice impossible to fully understand system performance in advance. This is not an issue of incomplete knowledge (the specifications would have given me enough knowledge); it is instead an aspect of fragile complexity. The systems we use and build have enough interactions between their parts that we simply can't hold a complete picture in our heads even if we have all the information we need available (which is often not the case). We can reason (and measure) backwards from actual performance to see how it arises, but in practice we cannot reason forward from specifications and components to always predict the system performance. When we try to do this, we always simplify the picture of the system in order to make it tractable to reason about. Often this simplification works, but often is not always.
Reasoning backwards from performance to cause works better, but it's not always foolproof. We're just as eager to create simplifications when we work backwards as when we work forwards. Fundamentally humans are rationalizing animals; we want to tell ourselves stories that make order out of chaos and we're not too picky what those stories look like. So the real validation is always measuring (with as few preconceptions as possible).
PS: a great example of this is the performance effects of shouting at your disk drives.
Comments on this page:Written on 27 November 2012.