Wandering Thoughts archives

2006-11-15

A little regexp thing to remember about \b (and \w)

A lot of documentation of Perl-style regular expressions describes \b as 'matching a word boundary' or similar phrases. Life would be simpler if the documentation used the phrase 'identifier boundary' instead, because \b's idea of word characters includes underscores. Thus \b and \w's idea of word characters makes a lot of sense for picking out identifiers in languages like C, but not necessarily so much sense for things like picking out words in written text.

(The same thing applies to GNU grep's --word-regexp option.)

Saying that this is documented if you read the full description of \b is no excuse. The problem is that 'word' is a dangerously loaded term to use, because it invites people to think that they know what it means and not read carefully (especially if they are skimming to refresh their memory). If the documentation used 'identifier' instead, people would not be led astray by their intuition about what a word is.

(This is a general problem with giving any technical definition a name that's a common term; people have to know, remember, or even realize that the common term doesn't mean what they think it means. For example, X Windows got a lot of people grumpy by inverting the way people thought about clients and servers, so in X the 'server' is on your desk and the 'client' is that big compute server over in the machine room.)

programming/RegexpWordMatching written at 23:24:44; Add Comment

Why the Bourne shell is not my favorite language

The difference between

for i in "a b"; do
	mv -f $i $i-UBUNTU
	...
done

and

FOO="a b"
for i in $FOO; do
	mv -f $i $i-UBUNTU
	...
done

is subtle (in visual appearance) and easy to accidentally forget, but important.

(Fortunately I am doing test installations in VMWare these days, so a mistake is less tedious than it used to be.)

programming/BourneNonFavourite written at 18:00:01; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.