2011-10-27
A Wikitext formatting mistake that I made here
There are a number of things about DWiki that I would change if I was writing it from scratch again. I'm not entirely sure that I'd completely replace its dialect of wikitext with a standard one such as Markdown, but there are certainly aspects of it that I would redo. In particular, one part of DWiki's wikitext dialect has turned out to be a terrible mistake.
You see, when I was designing my wikitext dialect, for some reason I decided that digits at the start of line should be one of the things that started a numbered list entry. No doubt this struck me as sensible at the time; it lets you write the vaguely traditional format of:
0 some numbered list 0 another entry on it 00 an inner numbered list 1 any number will do
(I may have copied this from somewhere or I may have thought it was a good thing to go along with the more common '#' at the start of lines.)
Then I started writing paragraphs with free-floating numbers in them, when for example I discuss 'Fedora 14' or '404 errors'. And reflowing the paragraphs. When you reflow paragraphs with free-floating numbers in them, at some point you are going to reflow a paragraph so that the number is at the start of a line. At that point my wikitext design mistake bit me in the rear, because the parser (such as it is) immediately declared the reflowed line to be the start of a numbered list.
(I've seen this issue bite people leaving comments, too.)
One of the things that this this says to me is that designing a wikitext dialect is hard partly because you don't necessarily find out your mistakes until you've started writing a substantial and varied amount of text in your new dialect, at which point it's difficult to change. (Well. Sort of. It's at least annoying.)
Another thing I take from this is that I really want to use some sort of real parser to parse wikitext, instead of hand built regexp-based processing. My hope is that using a real parser would make it easier to express things like 'numbered lists can't begin when you're in paragraph mode'.
(I have some vague opinions on why parsing wikitext is hard with traditional parsers, but it's possible that I just haven't seen the trick to doing it easily.)
(Probably someday I will bite the bullet by taking this 'feature' out of my wikitext parser, then finding and fixing all current pages that use it. So far that's been just a bit more annoying than living with the current situation.)
2011-10-02
Another comment spam precaution that no longer works out
I use a number of comment spam precautions here (although most of the work is done by only one). However every so often one of these clever tricks turns out to be not just useless but a bad idea. One of my most elaborate comment spam precautions here is signing the comment form with various information about the IP address that fetched it. When I came up with this precaution back four years ago, it was clear from my logs that spammers were fetching my 'add a comment' page from one IP address, sitting on it, and then submitting comments from another IP; adding the precaution caught a certain amount of spammers with no false positives that I could see.
Well. That situation has now changed. It's been some time since this precaution has prevented any spam; if spammers are still doing this at all, they're tripping over other precautions first (almost always my honeypot form field). Unfortunately I've now seen two instances where this precaution seems to have misfired, preventing real people from posting actual comments. So out it goes; I can live with inactive and useless comment spam precautions, but not ones that give false positives.
(Unfortunately I fumbled some code when I did this the first time. For semi-obvious reasons testing this case is kind of tricky, but I really should have tried harder.)
I'm not sure why people are hopping between significantly different IP addresses. My current theory is some sort of proxies, possibly for mobile devices and smartphones; if the proxy choice is basically random per request and the proxies are on multiple subnets, it would match what I saw in the logs. The alternate theory is that ISP DHCP servers are giving out significantly divergent IP addresses when people have flaky lines and keep disconnecting and reconnecting.