GNU Emacs and the case of special space characters
One of the things I've had to wrestle with due to my move to reading my email with MH-E in GNU Emacs is that any number of Emacs modes involved in this like to be helpful by reformatting and annotating your email messages in various ways. Often it's not obvious to an outsider what mode (or code) is involved. For what I believe are historical reasons, a lot of MIME handling code has wound up in GNUS (also), which was originally a news reader; some of the code and variables has 'gnus' prefixes while others has 'mm' or 'mml' prefixes. In MH-E (and I believe most things that use Emacs' standard GNUS-based MIME handling), by default you will get nominally helpful things like message fontisizing and maybe highlighting of certain whitespace that the code thinks you might care about. I mostly don't want this, so I have been turning it off where I saw it and could identify the cause.
(As far as message fontisizing goes, sometimes I don't object to it but I very much object to the default behavior of hiding the characters that triggered the fontisizing. I don't want bits of message text hidden on me so that I have to reverse engineer the actual text from visual appearance changes that I may or may not notice and understand.)
Recently I was reading an email message and there was some white space in it that Emacs had given red underlines, causing me to get a bit irritated. People who are sufficiently familiar with GNU Emacs have already guessed the cause, and in fact the answer was right there in what I saw from Leah Neukirchen's suggestion of looking at (more or less) 'C-u C-x ='. What I was seeing was GNU Emacs' default handling of various special space characters.
(I was going to say that this was a non-breaking space, but it turns out not to be; instead it was U+2002, 'en space'. A true non-breaking space is U+00A0.)
As covered in How Text Is Displayed,
Emacs normally displays these special characters and others with
the (Emacs) nobreak-space
face, which (on suitable displays)
renders the character as red with a (red) underline. Since all space
variants have nothing to render, you get a red underline. As covered
in the documentation, you can turn this off generally or for a
buffer by setting nobreak-char-display
to nil, which I definitely
won't be doing generally but might do for MH-E mail buffers, since
my environment generally maps special space characters to a plain
space if I paste them into terminals and the like.
(A full list of Emacs font faces is in Standard Faces.)
Zero-width spaces
(should I ever encounter any in email or elsewhere) are apparently
normally displayed using Glyphless Character Display's
'thin-space
' method, along with other glyphless characters, and
are Unicode U+200B. It's not clear to me if these will display with
a red underline in my environment (see this emacs.stackexchange
question and answers).
Some testing suggests that zero width spaces may hide out without
a visual marker (based on using 'C-x 8 RET' aka 'insert-char
'
to enter a zero-width space, a key binding which I also found out
about through this exercise). At this point I am too lazy to figure
out how to force zero-width spaces to be clearly visible.
PS: Other spaces known by insert-char
include U+2003 (em space),
U+2007 (figure space), U+2005 (four per em space), U+200A (hair space),
U+3000 (ideographic space), U+205F (medium mathematical space),
U+2008 (punctuation space), U+202F (narrow non-breaking space), and
more. It's slightly terrifying. Most of the spaces render in the same
way. I probably won't remember any of these Unicode numbers, but maybe
I can remember C-u C-x = and that 'nobreak-space' as an Emacs face is
an important marker.
PPS: Having gone through all of this, it's somewhat tempting to write some ELisp that will let me flip back and forth between displaying these characters in some clearly visible escaped form and displaying them 'normally' (showing as (marked) spaces and so on). That way I could normally see them very clearly, but make them unobtrusive if I had to deal with something that full of them in a harmless way. This is one of the temptations of GNU Emacs (or in general any highly programmable environment).
|
|