My new view on why you need to profile code

November 30, 2012

The usual and traditional story of why you want to profile your code when you're optimizing it is that people are terrible at predicting what their code is actually doing. It's not just that people don't predict where the time is going; often they don't even predict things like what functions are called a lot, how many times loops run, and so on. This view offers what I'll call a fantasy of perfect knowledge, the idea that if we really understood what our code was doing and how it was being used, if we already knew the code flow information that profiling uncovers, we could predict all of this performance stuff in advance. The job of profiling is to fill in for our imperfect and mistaken information.

Several years ago I read a Malcolm Gladwell article where he drew a distinction between puzzles and mysteries that has stuck with me ever since. To use his terms, in a puzzle you don't have enough information and your challenge is to get more while in a mystery you have plenty of information and your challenge is to make sense of it all. In these terms, the traditional story of profiling and optimization is that it is a puzzle; you resort to profiling because you have incomplete and wrong information about what your program is actually doing.

At one level this is undeniably true, but it's also incomplete. I now believe that profiling is actually a mystery inside a puzzle. Even if you 'solved' the puzzle, gaining perfect knowledge of your code's real behavior, you still would not be able to accurately predict the code's performance because our programs and our systems are too complex for all of their interactions to be fully predicted in advance (at least by real people in any useful way). The fundamental reason to profile is because it cuts straight through this, removing the need to solve the mystery.

This is a much more pessimistic and strong take on profiling than usual, but I've come around to the view that it's the real situation. We just don't usually think about it because we generally feel we don't have sufficiently good information about what our program is doing (and we usually blame failures of prediction on lack of information).

(Although this is not quite how Aristotle Pagaltzis put it in his comment on this entry, I suspect this matches up with his opinion on the situation.)

Written on 30 November 2012.
« One reason why having metrics is important
All kernel messages should be usefully ratelimited. No exceptions. »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Nov 30 00:58:39 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.