2006-11-15
A little regexp thing to remember about \b (and \w)
A lot of documentation of Perl-style regular expressions describes \b as 'matching a word boundary' or similar phrases. Life would be simpler if the documentation used the phrase 'identifier boundary' instead, because \b's idea of word characters includes underscores. Thus \b and \w's idea of word characters makes a lot of sense for picking out identifiers in languages like C, but not necessarily so much sense for things like picking out words in written text.
(The same thing applies to GNU grep's --word-regexp option.)
Saying that this is documented if you read the full description of \b is no excuse. The problem is that 'word' is a dangerously loaded term to use, because it invites people to think that they know what it means and not read carefully (especially if they are skimming to refresh their memory). If the documentation used 'identifier' instead, people would not be led astray by their intuition about what a word is.
(This is a general problem with giving any technical definition a name that's a common term; people have to know, remember, or even realize that the common term doesn't mean what they think it means. For example, X Windows got a lot of people grumpy by inverting the way people thought about clients and servers, so in X the 'server' is on your desk and the 'client' is that big compute server over in the machine room.)
Why the Bourne shell is not my favorite language
The difference between
for i in "a b"; do mv -f $i $i-UBUNTU ... done
and
FOO="a b" for i in $FOO; do mv -f $i $i-UBUNTU ... done
is subtle (in visual appearance) and easy to accidentally forget, but important.
(Fortunately I am doing test installations in VMWare these days, so a mistake is less tedious than it used to be.)
2006-11-02
Knowing things versus being able to prove them
Recently there has been a small contremps over how well (or not) IDEs can support refactoring for dynamic languages. To stereotype and condense, on one side are Java people who feel that without the guarantees provided by static typing, many sorts of automated refactorings are impossible to do reliably and so worthless; on the other side are Tim Bray and dynamic language people, who feel that various things can get them close enough, and besides there are a laundry list of situations where static languages don't have reliable refactorings either.
Recently I had a thought about the whole debate: one way to see it is in a conflict between knowing things and being able to prove them. In refactorings, the Java people want proof; Tim Bray and company are willing to settle for merely knowing things. And the heuristics and possibility for errors involved in merely knowing things give the Java people violent hives.
This also explains why the Java people don't see an equivalence between their refactoring imperfections and the dynamic language refactoring imperfections (something that irritates the dynamic language side of the debate a certain amount). The Java problem areas are already outside of what Java can prove, so people should already know that they're on their own in them in general; refactoring errors are just par for the course.
The difference between knowing things and being able to prove them also comes up a lot in computer security. A lot of security people are very hung up on the perfection of being able to prove things, and feel that anything less is not something that should be accepted.