Wandering Thoughts archives

2018-07-28

Word-boundary regexp searches are what I usually want

I'm a person of relatively fixed and slow to change habits as far as Unix commands go. Once I've gotten used to doing something in one way, that's generally it, and worse, many of my habits fossilized many years ago. All of this is a long-winded lead in to explaing why I have only recently gotten around to really using the '\b' regular expression escape character. This is a real pity, because now that I have my big reaction is 'what took me so long?'

Perhaps unsurprisingly, it turns out that I almost always want to search for full words, not parts of words. This is true whether I'm looking for words in text, words in my email, or for functions, variables, and the like in code. In the past I adopted various hacks to deal with this, or just dealt with the irritation of excessive matches, but now I've converted over to using word-boundary searches and the improvement in getting what I really want is really great. It removes another little invisible point of friction and, like things before it, has had an outsized impact on how I feel about things.

(In retrospect, this is part of what how we write logins in documentation was doing. Searching for '<LOGIN>' instead of 'LOGIN' vastly reduced the chance that you'd run into the login embedded in another word.)

There are a couple of ways of doing word-boundary searches (somewhat depending on the program). The advantage of '\b' is that it works pretty universally; it's supported by at least (GNU) grep, ripgrep, and less, and it's at least worth trying in almost anything that supports modern (or 'PCRE') regular expressions, which is a lot of things. Grep and ripgrep also support the -w option for doing this, which is especially useful because it works with fgrep.

(I reflexively default to fgrep, partly so I don't have to think about special characters in my search string.)

Per this SO question and its answers, in vim I'd need to use '\<' and '/>' for the beginning and end of words. I'm sure vim has a reason for having two of them. Emacs supports '\b', although I don't actually do regular expression searches in Emacs regularly enough to remember how to invoke them (since I just looked it up, the documentation tells me it's C-M-s and C-M-r, which ought to be reasonably memorable given plain searches).

PS: Before I started writing this entry, I didn't know about -w in grep and ripgrep, or how to do this in vim (and I would have only been guessing about Emacs). Once again, doing some research has proven beneficial.

PPS: I care about less because less is often my default way of scanning through pretty much anything, whether it's a big text file or code. Grep and company may tell me what files have some content and a bit of its context, but less is what let me poke around, jump back and forth, and so on. Perhaps someday I will get a better program for this purpose, but probably not soon.

sysadmin/RegexpWordBoundaryGood written at 00:23:55; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.