The X Window System and the curse of NumLock

May 14, 2024

In X, like probably any graphical environment, there are a variety of layers to keys and characters that you type. One of the layers is the input events that the X server sends to applications. As covered in the xlib manual, these contain a keycode, representing the nominal physical key, a keysym, representing what is nominally printed on the key, and a bitmap of the modifiers currently in effect, which are things like 'Shift' or 'Ctrl' (cf). The separation between keycodes and keysyms lets you do things like remap your QWERTY keyboard to Dvorak; you tell X to change what keysyms are generated for a bunch of the keycodes. Programs like GNU Emacs read the state of the modifiers to determine what you've typed (from their perspective), so they can distinguish 'Ctrl-Return' from plain 'Return'.

Ordinary modifiers are normally straightforward, in that they are additional keys that are held down as you type the main key. Control, Shift, and Alt all work this way (by default). However, some modifiers are 'sticky', where you tap their key once to turn them on and then tap their key again to turn them off. The obvious example of this is Caps Lock (unless you turn its effects off, remapping its physical key to be, say, another Ctrl key). Another example, one that many X users have historically wound up quietly cursing, is NumLock. Why people wind up cursing NumLock, and why I have a program to control its state, is because of how X programs (such as window managers) often do their key and mouse button bindings.

(There are also things that will let you make non-sticky modifier keys into sticky keys.)

Suppose, for example, that you have a bunch of custom fvwm mouse bindings that are for things like 'middle mouse button plus Alt', 'middle mouse button plus Shift and Alt', 'plain right mouse button on the root', and so on. Fvwm and most other X programs will normally (have to) interpret this completely literally; when you create a binding for 'middle mouse plus Alt', the state of the current modifiers must be exactly 'Alt' and nothing else. If the X server has NumLock on for some reason (such as you hitting the key on the keyboard), the state of the current modifiers will actually be 'NumLock plus Alt', or 'NumLock plus Alt and Shift', or just 'NumLock' (instead of 'no modifiers in effect'). As a result, fvwm will not match any of your bindings and nothing will happen as you're poking away at your keyboard and your mouse.

Of course, this can also happen with CapsLock, which has the same sticky behavior. But CapsLock has extremely obvious effects when you type ordinary characters in terminal windows, editors, email, and so on, so it generally doesn't take very long before people realize they have CapsLock on. NumLock doesn't normally change the main letters or much of anything else; on some keyboard layouts, it may not change anything you can physically type. As a result, having NumLock on can be all but invisible (or completely invisible on keyboards with no NumLock LED). To make it worse, various things have historically liked 'helpfully' turning NumLock on for you, or starting in a mode with NumLock on.

(X programs can alter the current modifier status, so it's possible for NumLock to get turnd on even if there is no NumLock key on your keyboard. The good news is that this also makes it possible to turn it off again. A program can also monitor the state of modifiers, so I believe there are ones that give you virtual LEDs for some combination of CapsLock, ScrollLock, and NumLock.)

So the curse of NumLock in X is that having NumLock on can be cause mysterious key binding failures in various programs, while often being more or less invisible. And for X protocol reasons, I believe it's hard for window managers to tell the X server 'ignore NumLock when considering my bindings' (see, for example, the discussion of IgnoreModifiers in the fvwm3 manual).


Comments on this page:

By franklin at 2024-05-15 10:27:22:
The separation between keycodes and keysyms lets you do things like remap your QWERTY keyboard to Dvorak

But the server continuing to send raw keycodes makes this difficult sometimes. Like, qemu's -k option used to make things look like QWERTY to the guest, but they've stopped that; so now I've got to change the guest's layout to Dvorak too, or run the console through something like xtigervncviewer that doesn't look at the keysyms (and can use a Unix-domain socket).

I think the "proper" way to handle mappings and modifiers would be to run semi-arbitrary code to translate events. Unfortunately, running Turing-complete code in the server—which then would've been running as root, maybe on a different machine—could be dangerous, and Berkeley Packet Filter wasn't invented for 5 years after X11.

Common sense would suggest that a key-press event doesn't need to include the state of number-lock; it's either 6 or right-arrow, for example, without needing to know where that came from. But maybe if it were kana-lock, the clients would care; I can kind of understand how history and a desire for genericity led us here.

By cks at 2024-05-15 12:17:45:

The technical detail not mentioned in my entry is that the X server only sends programs the keycode (plus modifiers) in the actual X events. Translating from keycodes to keysyms is done in the (client) programs, although the X server also sends you a translation mapping that you're supposed to use. This keeps the X server simple and makes the events minimal at the cost of pushing work to the clients, which was a good tradeoff when X was new (and everyone could be assumed to be using the same libraries to do this translation).

There are programs (such as games) that care about what physical key was hit, not what keysym it maps to given the modifier state, so you do want to provide relatively raw keycode access. Whether a window manager cares about the state of NumLock even for keys whose regular result is affected by it is really a design decision and I think it could go either way.

(Some people will say 'if I define a binding for Alt + right arrow, this should work for both the physical right arrow or the NumLock right arrow'. Some people will want only the physical right arrow.)

A virtual machine environment like Qemu is always going to be an odd case. It sort of wants to get kesyms and then turn them back into keycodes, since it has to provide keycodes to the virtual machine it's running. Unfortunately this translation is lossy in the face of certain things, which can leave you not being able to generate certain keycodes at all in the virtual machine (which can be a problem).

By franklin at 2024-05-15 13:43:30:

The client-side translation is odd and interesting.

There are programs (such as games) that care about what physical key was hit, not what keysym it maps to given the modifier state, so you do want to provide relatively raw keycode access.

No, I really don't. Or, at least, I've never seen a good example of when I would, though I accept your point that there may be cases when I might want to tell the two right-arrows apart (perhaps only at a "system" level, like for window-manager bindings). Still, if I were using F7 or something weird like that as my right-arrow key, I wouldn't want any program knowing that the key used to be F7; I'd just need qemu to know whether to send the code for number-pad-right or independent-right.

Regarding games, Mednafen is a particular annoyance in this regard: "All default key mappings are by scancode, so you'll need to press the keys corresponding to the appropriate positions on the standard U.S keyboard layout". So for the cheat menu that's documented as ALT+C, I have to press ALT+J. It's ridiculous; if I knew of an easy way (e.g. $LD_PRELOAD) to make X clients see remapped scancodes too, I'd use it. I looked into this briefly and couldn't find one.

By cks at 2024-05-15 14:53:38:

One half formed thought is that X perhaps should have a third layer of mapping, between keycodes and keysyms, so that you went keycodes → keylabel (what is nominally printed on the keycap) → keysym (what is generated given the current modifiers). Remapping the keyboard layout would mostly happen at the keylabel layer, and programs like games and Mednafen would work at this level too.

On at least Linux, there are now tools that remap keys at a level below the X server ('evdev', the input events system), and I believe this will change the keycodes the X server will send clients. A 2021 article on this area I saw a pointer to on the Fediverse recently is Key Remapping in Linux — 2021 Edition.

Written on 14 May 2024.
« Some ideas on what Linux distributions can do about the new kernel situation
Turning off the X server's CapsLock modifier »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Tue May 14 23:15:38 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.