What performance anomalies mean
I'm currently engaged in a slow-moving effort to improve the performance
os.walk() filesystem walking function (I have a
long-standing interest in it and recently
wrote about optimizing filesystem
walking on Unix). As part of the work I've been benchmarking several
variants of my code, and I stumbled over a performance anomaly where
os.walk() is often faster than what should be a slightly
optimized version of it.
It's awfully easy to dismiss minor performance anomalies, especially if they happen in the early stages of optimizing something (in what you already expect to be only a minor improvement at best). But to do so is to miss what they mean:
Performance anomalies are a sign that you don't understand something important about what's really going on in your system.
You might ask why this matters, especially if your full optimizations are still faster than the original code.
We don't optimize code by making random changes and seeing if they speed the code up. Instead, we have a mental model of what is making the code slow and thus what can be done to make it faster. A performance anomaly means that some part of this mental model is wrong; either we don't actually understand why the existing code is slow, or we don't understand something about the runtime environment that makes our new code slower than we think it ought to be.
(I am going to assume that you have a clear explanation for why your new
code ought to be faster, such as 'it does only one
stat() instead of
Understanding performance anomalies is especially important in modern high level languages because those languages do so much under the surface, hidden behind their high levels of abstraction. But you can't do deep optimizations without understanding what's happening down in the depths of your runtime environment; performance is one of the things that always leaks through abstractions. So a mysterious performance anomaly is a sign that you don't understand what's behind the abstractions as well as you thought you did.
(PS: this doesn't mean that I understand why my little
optimization turned out to be often slower. I haven't had the time to
look into it, especially because I expect it to be difficult to track