2008-03-30
Docstrings versus comments in my code
One of the great not quite arguments in the Python world is between docstrings and comments, specifically which one you should use in your code. My answer is that I use both, although more comments than docstrings, but I use them for different things.
My comments are primarily written as internal documentation; how a function operates, why it operates that way, the high level logic and structure of the code, and so on. Docstrings, when I write them, are external documentation, covering things like how to use a module, a class, or a function.
Since I mostly write Python code for myself, my comments shade into a bit of external documentation; often I stick a little summary of the function into a comment instead of a docstring. Partly this is because I feel that comments are less formal than docstrings; as docstrings actually are external documentation, I feel the need to make them decent external documentation.
(And sometimes I hijack docstrings for entirely unrelated purposes, since they can be easily looked at from inside Python. DWiki's code uses docstrings to describe how to use text macros, for example, because it means that I can just embed usage descriptions into the actual macro functions instead of having to maintain separate help text somewhere else.)
What I see as the formality of docstrings also makes me nervous about writing them in the 'wrong' format, or at least a less than helpful and clear format. These days there does seem to be a relatively consistent docstring format for things, but I don't think it's very well described anywhere I could easily find (PEP 257 does not qualify).
(This entry is sort of inspired by reading this.)
2008-03-01
Speed surprises in reimplementing the .find() string method
In yesterday's entry, I mentioned that I
was surprised by the relative performances of my three reimplementations
of the .find() string method.
The src[i:].startswith(sub) version was the slowest, and why is
obvious in hindsight: it's summed up clearly in the first bit, where we
create a big new string (src[i:]) on every iteration of the loop. No
wonder it's slow; creating big strings is slow in general, because you
both churn object memory and copy a lot of
data around.
Now it's easy to see why the brute force src[i:i+len(sub)] == sub is
much faster; with it, we are only creating a relatively small new string
every iteration around the loop, especially with my rather small sub
string.
The real surprise to me is that the src.startswith(sub, i) version is
not the fastest, and not by a small margin (it's generally more than
twice as slow as the fastest version). By all rights it should be by
far the winner, since it requires no object creation or data copying
and should boil down to just a memcmp(). I don't know what the Python
runtime is doing, but clearly something is dropping the ball.
The really interesting part of this for me is that I started with the
the one-argument startswith() version; I added the brute force version
only out of a sense of completeness, because it was the most C-like
approach, and I expected it to be the slowest.