Chris's Wiki :: blog/python/MoreRegexpPerformance Commentshttps://utcc.utoronto.ca/~cks/space/blog/python/MoreRegexpPerformance?atomcommentsDWiki2007-02-01T20:53:38ZRecent comments in Chris's Wiki :: blog/python/MoreRegexpPerformance.From 67.169.137.35 on /blog/python/MoreRegexpPerformancetag:CSpace:blog/python/MoreRegexpPerformance:fc1e8804f0df926652aebd9f8d2e282519cf89faFrom 67.169.137.35<div class="wikitext"><p>Ok I'm coming across this long after you wrote it- but for some explanation of what's going on (and incidently, why grep's regular expression does not choke on this) - see <a href="http://swtch.com/~rsc/regexp/regexp1.html">http://swtch.com/~rsc/regexp/regexp1.html</a></p>
</div>2007-02-01T20:53:38ZBy Chris Siebenmann on /blog/python/MoreRegexpPerformancetag:CSpace:blog/python/MoreRegexpPerformance:a91ed8c2b9d360f98ac44656ec83e1ca7d5497e2Chris Siebenmann<div class="wikitext"><p>I think I wasn't explicit enough in the entry (and the first entry): most
of the measurements and ratios were for the pure miss / fail case.
(Pure fail cases are important for spam scanners, among other things.)</p>
<p>The current rebench testbeds use strings about 22K long, for various
fuzzy reasons including that I believe it's roughly typical for what
I work with. (Also, I had to pick some length, because I wasn't
interested enough to run a huge cross-matrix and analyze the results.)</p>
</div>2006-02-27T08:38:15ZFrom 84.58.50.182 on /blog/python/MoreRegexpPerformancetag:CSpace:blog/python/MoreRegexpPerformance:14c7933d8fe27b5c80923548655540832e5277f9From 84.58.50.182<div class="wikitext"><p>The poor performance of alternations has been addressed to a certain extent in perl 5.9.3. Id really like to see the perl benchmarks redone using blead perl, as it would give a good idea of the speedup that can be expected in perl 5.10. It should be considerable (proportional to the number of alternations). There is also room for more improvement there (aho-corasick matching), but the hairyness of the engine got to me, and I havent got back into it to complete it. Also, you should add pure fail cases to your test suites, for short, medium, long and very long strings. I think perl will do even better in those cases than the others. With blead performing much better. -- demerphq</p>
</div>2006-02-27T01:33:56ZBy Chris Siebenmann on /blog/python/MoreRegexpPerformancetag:CSpace:blog/python/MoreRegexpPerformance:dab1a0673c02f7ef782fb35148f983cf4532cb94Chris Siebenmann<div class="wikitext"><p>I thought about trying to put some sort of table of results in, but
couldn't come up with a layout and a set of data to include that I
really liked when I was writing the entry. Perhaps it really needs
graphs, in which case <a href="https://utcc.utoronto.ca/~cks/space/dwiki/DWiki">DWiki</a> needs to grow inline images or something.</p>
<p>My other concern is that some of the effects seem to be CPU and/or
version dependant, and I felt nervous about throwing in exact numbers
without either lots of qualifications (and possibly extra ones) or
figuring out what was the important variable. (That's why the entry
is loaded with qualifications like 'at least' or very large and vague
ranges.)</p>
</div>2006-02-23T18:45:56ZBy DanielMartin on /blog/python/MoreRegexpPerformancetag:CSpace:blog/python/MoreRegexpPerformance:789f0aa097ac964365dd4d11c6e37f08f0e694d0DanielMartin<div class="wikitext"><p>The only thing I'd add to this is the actual tables of results themselves. It's one thing to <em>say</em> that a regexp of <code>(A|B)</code> performs poorly; it's another thing entirely to stare the two-orders-of-magnitude different numbers in the face.</p>
<p>Unfortunately, I probably won't have time to make a pretty html page out of the results since work is about to take over my life for the next week, but I think that having the numbers presented well would help.</p>
</div>2006-02-21T12:55:57Z