A little regexp thing to remember about \b (and \w)

A lot of documentation of Perl-style regular expressions describes \b as 'matching a word boundary' or similar phrases. Life would be simpler if the documentation used the phrase 'identifier boundary' instead, because \b's idea of word characters includes underscores. Thus \b and \w's idea of word characters makes a lot of sense for picking out identifiers in languages like C, but not necessarily so much sense for things like picking out words in written text.

(The same thing applies to GNU grep's --word-regexp option.)

Saying that this is documented if you read the full description of \b is no excuse. The problem is that 'word' is a dangerously loaded term to use, because it invites people to think that they know what it means and not read carefully (especially if they are skimming to refresh their memory). If the documentation used 'identifier' instead, people would not be led astray by their intuition about what a word is.

(This is a general problem with giving any technical definition a name that's a common term; people have to know, remember, or even realize that the common term doesn't mean what they think it means. For example, X Windows got a lot of people grumpy by inverting the way people thought about clients and servers, so in X the 'server' is on your desk and the 'client' is that big compute server over in the machine room.)

These are my WanderingThoughts
(About the blog)

GettingAround
Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.

* * *

Atom feeds are available; see the bottom of most pages.

This is a DWiki.
(Help)

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

Search:
Written on 15 November 2006.
(Previous | Next)

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Nov 15 23:24:44 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.