Speed surprises in reimplementing the
.find() string method
In yesterday's entry, I mentioned that I
was surprised by the relative performances of my three reimplementations
.find() string method.
src[i:].startswith(sub) version was the slowest, and why is
obvious in hindsight: it's summed up clearly in the first bit, where we
create a big new string (
src[i:]) on every iteration of the loop. No
wonder it's slow; creating big strings is slow in general, because you
both churn object memory and copy a lot of
Now it's easy to see why the brute force
src[i:i+len(sub)] == sub is
much faster; with it, we are only creating a relatively small new string
every iteration around the loop, especially with my rather small
The real surprise to me is that the
src.startswith(sub, i) version is
not the fastest, and not by a small margin (it's generally more than
twice as slow as the fastest version). By all rights it should be by
far the winner, since it requires no object creation or data copying
and should boil down to just a
memcmp(). I don't know what the Python
runtime is doing, but clearly something is dropping the ball.
The really interesting part of this for me is that I started with the
startswith() version; I added the brute force version
only out of a sense of completeness, because it was the most C-like
approach, and I expected it to be the slowest.
Comments on this page:Written on 01 March 2008.