Wandering Thoughts archives

2006-11-15

A little regexp thing to remember about \b (and \w)

A lot of documentation of Perl-style regular expressions describes \b as 'matching a word boundary' or similar phrases. Life would be simpler if the documentation used the phrase 'identifier boundary' instead, because \b's idea of word characters includes underscores. Thus \b and \w's idea of word characters makes a lot of sense for picking out identifiers in languages like C, but not necessarily so much sense for things like picking out words in written text.

(The same thing applies to GNU grep's --word-regexp option.)

Saying that this is documented if you read the full description of \b is no excuse. The problem is that 'word' is a dangerously loaded term to use, because it invites people to think that they know what it means and not read carefully (especially if they are skimming to refresh their memory). If the documentation used 'identifier' instead, people would not be led astray by their intuition about what a word is.

(This is a general problem with giving any technical definition a name that's a common term; people have to know, remember, or even realize that the common term doesn't mean what they think it means. For example, X Windows got a lot of people grumpy by inverting the way people thought about clients and servers, so in X the 'server' is on your desk and the 'client' is that big compute server over in the machine room.)

RegexpWordMatching written at 23:24:44; Add Comment

Why the Bourne shell is not my favorite language

The difference between

for i in "a b"; do
	mv -f $i $i-UBUNTU
	...
done

and

FOO="a b"
for i in $FOO; do
	mv -f $i $i-UBUNTU
	...
done

is subtle (in visual appearance) and easy to accidentally forget, but important.

(Fortunately I am doing test installations in VMWare these days, so a mistake is less tedious than it used to be.)

BourneNonFavourite written at 18:00:01; Add Comment

2006-11-02

Knowing things versus being able to prove them

Recently there has been a small contremps over how well (or not) IDEs can support refactoring for dynamic languages. To stereotype and condense, on one side are Java people who feel that without the guarantees provided by static typing, many sorts of automated refactorings are impossible to do reliably and so worthless; on the other side are Tim Bray and dynamic language people, who feel that various things can get them close enough, and besides there are a laundry list of situations where static languages don't have reliable refactorings either.

Recently I had a thought about the whole debate: one way to see it is in a conflict between knowing things and being able to prove them. In refactorings, the Java people want proof; Tim Bray and company are willing to settle for merely knowing things. And the heuristics and possibility for errors involved in merely knowing things give the Java people violent hives.

This also explains why the Java people don't see an equivalence between their refactoring imperfections and the dynamic language refactoring imperfections (something that irritates the dynamic language side of the debate a certain amount). The Java problem areas are already outside of what Java can prove, so people should already know that they're on their own in them in general; refactoring errors are just par for the course.

The difference between knowing things and being able to prove them also comes up a lot in computer security. A lot of security people are very hung up on the perfection of being able to prove things, and feel that anything less is not something that should be accepted.

KnowledgeVersusProof written at 22:58:24; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.