You don't necessarily know what matters for performance

November 27, 2012

One of the obvious lessons I could learn from the switch issue at the heart of our recent disk performance issue is that low level details matter, sometimes a lot. While this is true, I think it's a superficial thing to take away from this learning experience. The lesson I really want to take away is a more general one:

I don't know in advance what matters for performance.

Until I had this experience you probably could have sat me down with the switch specifications for both switches, asked me if there were any differences that mattered for our iSCSI performance, and I would have gone through the specs and confidently told you no. Obviously I would have been wrong.

What this convinces me is that it is in practice impossible to fully understand system performance in advance. This is not an issue of incomplete knowledge (the specifications would have given me enough knowledge); it is instead an aspect of fragile complexity. The systems we use and build have enough interactions between their parts that we simply can't hold a complete picture in our heads even if we have all the information we need available (which is often not the case). We can reason (and measure) backwards from actual performance to see how it arises, but in practice we cannot reason forward from specifications and components to always predict the system performance. When we try to do this, we always simplify the picture of the system in order to make it tractable to reason about. Often this simplification works, but often is not always.

Reasoning backwards from performance to cause works better, but it's not always foolproof. We're just as eager to create simplifications when we work backwards as when we work forwards. Fundamentally humans are rationalizing animals; we want to tell ourselves stories that make order out of chaos and we're not too picky what those stories look like. So the real validation is always measuring (with as few preconceptions as possible).

PS: a great example of this is the performance effects of shouting at your disk drives.

Comments on this page:

From at 2012-11-27 16:59:55:


This is parallel to the rules about code performance that I have derived from programming experience:

  1. If you do not profile it, you have no idea why it performs as it does.
  2. If you try to guess where the bottleneck will be, you will always be wrong. No exceptions.

The latter rule might seem too heavy-handed, but practice has taught me that no, even coming from a programmer with solid optimisation experience gauging a toy code example, a lot of guesses will be bunk. With real code? No chance.

Colour me unsurprised that the sysadmin equivalent turns out exactly the same.

Aristotle Pagaltzis

Written on 27 November 2012.
« More thoughts on why Python doesn't see much monkey-patching
When you make a harmless change, check to make sure that it is »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Nov 27 00:07:40 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.