A fun little regular expression bug
The problem with regular expressions is the same problem as computer programming in general: the computer will faithfully do exactly what you told it to do, regardless of whether or not this was what you actually wanted.
(The other problem with regular expressions is that they are a crappy programming language, not in power (they've got plenty of power) but in terms of being able to read and write them. (Okay, one of the other problems with regexps. There are more.))
- foo bar: more text
The simple regular expression to match this and to group the 'foo bar' and the 'more text' bits is:
However, this matches too much if you have a 'blah: more' in the 'more text' bit, because regular expressions are greedy. Fine, no problem:
- [[foo http://bar]]: baz.
This is because the regular expression as written requires that the 'foo bar' portion not merely have no ': ' bits in it, but that it have no colons in it at all. (If it does have a ':' but no following space, the group ends but the following ': ' required by the regexp isn't there.)
My quick on the spot solution was to club in an exception:
(This says that the first group is allowed to contain colons, as long as they are not followed by a space. Yes, I could probably write that as ':\S' instead. I was in a hurry; I count it lucky that I wrote '\s' instead of just ' '.)
I did it this way because I was very much in a quick fix, patch the existing regexp mode at the time. In retrospect, the whole thing might be better written using a non-greedy regexp match:
(Disclaimer: the regexps are slightly simplified from the real ones that DWiki uses, because the real ones are encrusted with some internal concerns that obfuscate them a bit.)