Wandering Thoughts archives

2008-02-29

An illustration of the speed advantage of Python builtins

As a followup to my entry on the two sorts of languages, I decided to actually try implementing the string .find() method in Python to see how much slower it is. The Python version uses the brute force C style approach, roughly:

for i in range(0, len(src)-len(sub)+1):
    if src[i:i+len(sub)] == sub:
       return i
return -1

I actually did three versions, each with a different if test; the other two versions used src[i:].startswith(sub) and src.startswith(sub, i). The last strikes me as pretty close to cheating, since it delegates a great deal of the work to builtins.

I benchmarked each version on various lengths of src strings against a five character sub string, none of which contained the sub string or any characters from it. I'll skip making a big table, and just say that the fastest of the three versions ranged between 2.9 times slower than builtin .find() (for 8 characters) and 373 times slower (at 32K), and was already 35 times slower at only 128 characters.

The other versions were worse, sometimes much worse; at its slowest, the worst of the three versions was 17,554 times slower than builtin .find(), and at only 1K it was still 458 times slower (versus a mere 174 times slower for the fastest version).

(I was surprised by which version turned out to be the slowest, although in hindsight I shouldn't have been.)

BuiltinsSpeedIllustration written at 23:43:19; Add Comment

2008-02-18

Coding paralysis

DWiki's comment system is acceptable as it stands (the proof is in the pudding, in that some people are willing to use it), but it needs to be improved. Specifically, it's been clear to me for a while that there should be 'your name' and 'your website' fields so that people can conveniently identify themselves.

(And so I can have a better idea of who's leaving comments. The current DWiki comment system dates to when it was a half-hearted addition glued on to something I expect to be used primarily as an internal wiki.)

This is a bit of a project, since the comment storage format needs to change (and thus the code needs to deal with comments in either format), but it's not too much work, at least in theory. In practice I have been not doing this for some time; I've wound up in coding paralysis, where I know what I want to do but I can't can't moving on it.

At its heart, my coding paralysis is because I don't feel enthused enough about the changes I want to make; they are feeling too much like work and not enough like fun. At the same time I've thought enough about them that they've become locked in my mind as the 'next step' I want to do, so I don't even think about other DWiki changes that might be more fun.

Probably this really means that I need to find something completely different to code up and obsess over.

Sidebar: some gory details

One reason for my coding paralysis is scope creep; I've let what I want to achieve get too big. For example, I know that I want to add OpenID support someday, and I don't really want to make two significant changes to the comment handling code, so I should really do that as part of the significant revision. But adding OpenID is a bunch of work, and is going to require a bunch of thinking about how to best to add it to DWiki's processing models; it's effectively a significant project on its own, and yet I'm trying to wedge it into my first change.

Another reason is that a chunk of the coding is grunge work, out of proportion to the coolness of the feature it adds. Adding fields to comments requires a new comment storage format and dealing sensible with both old and new comments, which is a bunch of boring code (and I didn't put version information into the comment storage format; bad me).

CodingParalysis written at 23:08:20; Add Comment

2008-02-13

A consequence of Python's 'computer science' nature

Here is a thesis: Python being a very 'computer science' language is a good part of the polarized reactions it often gets.

The necessary flipside of being regular and predictable is that Python is also rigid and what I will call 'solid': it is not just rigid in separate, independent pieces, it is rigid all through, which comes naturally from the rigorous expansion of its core concepts through the language (and from how they tend to interlock).

If the core concepts resonate with you, then the solid uniformity means that you will probably like a whole lot of Python. However, if the core does not click with you, then that very uniformity means that there is very little or nothing to interest you, nothing that feels right to you. This is unlike what happens with more flowing and flexible languages, ones with a lot of idioms and approaches mixed together, where you may dislike most of the whole but find bits that are in your style anyways.

(I think of Perl as an example of a flowing, flexible language; for example, you can write Perl code in all sorts of equally idiomatic ways, and two people can have very different styles of Perl code.)

PolarizingPython written at 23:07:53; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.