Keyboard keys and characters

October 10, 2023

When I wrote about my understanding of completion in Emacs, I mentioned that on demand completion is accessible through what Emacs calls "M-TAB", a notation that means 'Meta-<tab>', where Meta is normally your Alt key(s). Over on the Fediverse, Jason P mentioned that this completion is also accessible through "C-M-i", Ctrl-Meta-i. This will raise the eyebrows of some people, because Ctrl-i is also a tab; in ASCII (and thus Unicode), tab is literally Ctrl-i. What's going on here is a distinction in modern graphical environments between keys and characters. In fact there's often at least three layers involved.

At the lowest level, your keyboard typically generates scancodes when keys are pressed. On a normal keyboard, each physical key has an associated scancode (or two, one when it's pressed and one when it's released), and it always generates that scancode. In your operating system and graphical environments, these scancodes may then get remapped for various different reasons to create, for example, an X keycode. At this point these are broadly just numbers, although there is a default meaning associated with them based on standards (ie, everything knows that the key normally labeled 'a' on a USB keyboard will generate a certain scancode).

(Some people use special keyboards with firmware (such as QMK) that has things like 'layers' that allow them to change the scancodes generated by physical keys on the fly.)

Then your graphical environment will take these numbers and assign a meaning to them; in X, these are 'keysyms'. The reason we assign meanings at this layer is that it handles things like different keyboard layouts, such as QWERTY versus Dvorak, national keyboards with alternate symbols associated with them, and people's desire to remap bits of their keyboards (for example making Caps Lock not that). Graphical programs, such as Emacs in graphical mode, tend to do key binding based on keysyms, although often at a slightly abstract layer where there is just 'Alt' instead of 'Left Alt' and 'Right Alt'. At the keysym layer (and the scancode layer), there is a clear distinction between the TAB key and the 'i' key hit with Ctrl held down (or active), and so programs like Emacs can bind them separately.

(Programs can also do internal mappings and translations so that, for example, you don't have to separately bind 'TAB' and 'Ctrl-i', or that 'Ctrl-Return' is treated like Return unless you have a special binding set for it. These translations are often program specific; one program may treat Ctrl-Return as Return by default, and another one may reject it as 'no binding set'.)

Then we have the character layer, where key presses ultimately generate characters (usually UTF-8 these days). At the character layer, TAB and Ctrl-i (and Ctrl-Shift-i) are generally indistinguishable, as they all generate the ASCII (and Unicode) character 9. You could make them generate different characters if you wanted to, but then you'd have to decide what the other character is. Various parts of the intermediate keysym layer aren't representable any more at the character level; generally all modifiers (like Alt, Meta, and Ctrl) are lost unless they can be transformed into ASCII control characters. At the keysym layer you can generally easily tell the difference between Ctrl-Return, Shift-Return, and plain Return, but not at the character level.

One reason the character layer matters is because the character layer is what you get in terminal windows, including in remote logins over SSH. Programs like Emacs that want to both work in a terminal and take advantage of richer keysym bindings get to adopt various workarounds. One long standing Emacs workaround is that an Escape character gives the next character the Meta marker; 'ESC a' is M-a. Emacs also has a system where C-x @ <character> applies various modifiers to the next real key; this is sufficiently awkward that you probably only want to do it in desperation, or if your terminal program automatically generates the right prefix for you. I believe that people who expect to use Emacs in terminal windows try not to wind up depending on key bindings that don't work well in a character environment.

(Perhaps partly because of Emacs, many Unix terminal emulators translate 'Alt-<key>' into the sequence 'Escape <key>'. Other modifiers are either not translated at all or are translated to create standard ASCII characters. In the past, terminal emulators sometimes set the 8th bit in characters to represent Alt (or Meta), but that died off when UTF-8 became a common thing.)

Written on 10 October 2023.
« My understanding of various sorts of completion in GNU Emacs
The wisdom of being selective about python-lsp-server plugins »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Oct 10 23:02:54 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.