Wandering Thoughts archives

2023-12-18

In Go, constant variables are not used for optimization

Recently I wrote about partially emulating #ifdef with build tags and consts, exploiting Go's support for dead code elimination, and I said that this technique didn't work with variables. That's actually a somewhat interesting result. To see how it is, let's start with a simple Go program, where the following code is the entire program:

package main
import "fmt"

var doThing bool

func main() {
  fmt.Println("We may or may not do the thing.")

  if doThing {
     fmt.Println("We did the thing.")
  }
}

Here, 'doThing' is a boolean variable that is left at a zero value (false), and isn't exported on top of being in the 'main' package. There's nothing in the Go specification that allows the false value of 'doThing' to ever change. Despite this, if you inspect the resulting code the if and its call to 'fmt.Println()' is still present. If you go in with a debugger and manually set doThing to true, this code will run.

If you feed a modern C compiler a similar program, with 'doThing' declared as a static int, what you get back is code that has optimized out the code guarded by 'doThing'. The C compiler knows that the rules of the C abstract machine don't permit 'doThing' to change, so it has optimized accordingly. Functionally your 'static int doThing;' is now a constant, so the C compiler has then proceeded to do dead code elimination. The C compiler doesn't care that you could, for example, go in with a debugger and want to change the value of 'doThing', because the existence of debuggers is not included in the C abstract machine.

(This focus of C optimization on the C abstract machine and nothing beyond it is somewhat controversial, to put it one way.)

Go could have chosen to optimize this case in the same way as C compilers do, but for whatever reasons the Go developers didn't choose to do so. One possible motivation to not do this is the case of debuggers, where you can manually switch 'doThing' on at runtime. Another possible motivation is simply to speed up compiling Go code and to keep the compiler simpler. A C compiler needs a certain amount of infrastructure so that it knows that the static int 'doThing' never has its value changed, and then to propagate that knowledge through code generation; Go doesn't.

Well actually that's a bit of a white lie. The normal Go toolchain doesn't do all of this with these constant variables, but there's also gccgo, a Go implementation that's a frontend for GCC (along side C, C++, and some others). Since gccgo is built on top of GCC, it can inherit all of GCC's C focused optimizations, such as recognizing constant variables, and if you invoke gccgo with the optimization level high enough, it will optimize the 'doThing' guarded expression out just like C (this omits the first call to fmt.Println to make the generated code slightly clearer).

(There have been some efforts to build a Go toolchain based on LLVM, and I'd expect such a toolchain to also optimize this Go code the way gccgo does.)

GoKeepsConstantVariables written at 21:09:55; Add Comment

2023-12-14

Partially emulating #ifdef in Go with build tags and consts

Recently on the Fediverse, Tim Bray wished for #ifdef in Go code:

I would really REALLY like to have #ifdef in this Go code I’m working on - there’s this fairly heavyweight debugging stuff that I regularly switch in to chase a particular class of problems, but don’t want active in production code. #ifdef would have exactly the right semantics. Yeah, I know about tags.

Thanks to modern compiler technology in the Go toolchain, we can sort of emulate #ifdef through the use of build tags combined with some other tricks. How well the emulation works depends on what you want to do; for some things it's almost perfect and for other things it's going to be at best awkward.

The basic idea is to take advantage of build tags combined with dead code elimination (DCE). We'll use tagged files to define a constant, say doMyDebug, to either true or false:

$ cat ifdef-debug.go
//go:build !myrelease
package ...

const doMyDebug = true

$ cat ifdef-release.go
//go:build myrelease
package ...

const doMyDebug = false

Now you can use 'if doMyDebug { .... }' in your regular code as a version of #ifdef. The magic of dead code elimination in Go will remove all of your conditional debugging code if you build with a 'myrelease' tag and so define 'goMyDebug' as false. Go's dead code elimination is smart enough to eliminate not just the code itself but also any data (such as strings) that's only used by that code, any functions called only by that code (directly or indirectly), any data used only by those functions, and so on (although none of this can be exported from your package).

This works fine for one #ifdef equivalent. It works less fine if you want a number of them, all controlled independently, because then you need two little files per flag, which makes the clutter add up fast. You can confine the mess by creating an internal package to hold all of them, say 'internal/ifdef', and then importing it in the rest of your code and using 'if ifdef.DoMyDebug { ... }' (the name has to be capitalized since now it has to be exported).

Where this starts to not work so well is if you want to do more than put debugging code into functions. A semi-okay case is if you want to keep some completely separate additional data (in separate data structures) when debugging is turned on. Here your best bet is probably to put all of the definitions and functions in your conditionally built 'ifdef-debug.go', with function stubs in ifdef-release.go, and call the necessary functions from your regular code. You don't need to make these calls conditional; Go is smart enough to inline and then erase function calls to empty functions (or functions that in non-debugging mode return a constant 'all is okay' result, and then it will DCE the never-taken code branch). This requires you to keep the stub versions of the functions in sync with the real versions; the more such functions you have the worst things are probably going to be.

Probably the worst case is if you want to conditionally augment some of your data structures with extra fields that are only present (and used) when debugging is defined on. If you can have the fields always present but only touched when debugging is defined on, this is relatively straightforward (since we can protect all the code with 'if ...' and then let DCE eliminate it all when debugging is off). However, if you want the fields to be completely gone with debugging off (so that they don't take up any memory and so on), then life is at best rather awkward. In the straightforward version you need to duplicate the definition of these structures in equivalents of ifdef-debug.go and ifdef-release.go, and only access the extra debugging fields through functions that are also in ifdef-debug.go (and stubbed out in ifdef-release.go). This will probably significantly distort your code structure and make things harder to follow and more error prone.

A less aesthetic version of adding extra data to data structures only when debugging is on is to put all of the debugging data fields into a separate struct type, and then put an instance of the entire struct type in your main data structure. For example:

type myCoreType struct {
   [...]
   dd      extraDebugData
   [...]
}

The real definition of extraDebugData and its fields is in your ifdef-debug.go file, along with the functions that manipulate it. Your ifdef-release.go stub file has an empty 'struct {}' definition of extraDebugData (and stub versions of all its functions). Note that you don't want to put this extra data at the end of your core struct, because a zero-sized field at the end of a struct has a non-zero size. It may also be more difficult to get a minimally-sized myCoreType structure that doesn't have alignment holes with debugging on, depending on what debugging fields you're adding. This still has the disadvantage that you can't manipulate these extra debugging fields in line with the rest of your code; you have to call out to separate functions that can be stubbed out.

(The reason for this is that even though the code may never be executed, Go still requires it to be valid and to not do things like access struct fields that don't exist.)

A variation of this with extra memory overhead that allows for inline code is to always define the real extraDebugData struct but use a pointer to it in myCoreType. Then you can set the pointer and manipulate its fields through regular code guarded by 'if doMyDebug' (or perhaps 'if doMyDebug && obj.dd != nil'), and have it all eliminated when doMyDebug is constant false. This creates a separate additional allocation for the extraDebugData structure in debug mode and means your release builds have an extra pointer field in myCoreType that's always nil.

All of this only works with constants for the doMyDebug names, not with variables, which means you can't inject the 'ifdef' values on the command line through the Go linker's support for setting string variable values with -X. You have to use build tags and constants in order to get the dead code elimination that makes this more or less zero cost when you're building a release version.

(I suggest that you make the default, no tags version of your code be the one with everything enabled and then specifically set build tags to remove things. I feel that this is more likely to play well with various code analysis tools and editors, because by default (with no tags set) they'll see full information about fields, types, functions, and so on.)

PS: There are probably other clever ways to do this.

GoPartialIfdefWithConsts written at 23:20:12; Add Comment

2023-11-27

Go's API stability and making assumptions, even in semi-official code

Right now, there are some Go programs, such as errcheck, that will work if you build them with Go 1.21 but fail if you build them with the latest Go development version. In fact they will panic deep in the Go standard library, specifically in go/types. Given Go's promises about compatibility, you might expect that this is an error in the development version of Go, especially since the path to the panic goes through golang.org/x/tools, which is also written by the Go developers (although it's not stable and its own API isn't covered by compatibility guarantees). However, it's not a Go problem (cf, also). Instead, it shows either how tricky API compatibility is in practice or alternately how almost anyone can fall prey to Hyrum's law (obligatory xkcd).

The problem is actually more or less in black and white in the code of golang.org/x/tools, although I had to stare at the diff in the change request for a while in order to see it. The relevant old code of golang.org/x/tools was:

// types.SizesFor always returns nil or a *types.StdSizes.
response.dr.Sizes, _ = sizes.(*types.StdSizes)

I must inform you that in the Go development version, the assertion in the comment is no longer true. As a result, the type assertion here usually or always fails, resulting in response.dr.Sizes being nil all the time, and then you later get a panic downstream.

(If the type assertion was a single argument one, it would panic. But since it's the two argument form, it's returning a nil, stored in response.dr.Sizes, and a boolean false that gets thrown away. I believe the error case was handled separately earlier, so in practice response.dr.Sizes was expected to be non-nil.)

However, this is not an API break in Go because types.SizesFor() has never promised to return a types.Sizes interface object with a specific underlying type, including a *types.StdSizes. It just happened to always do so until relatively recently, and originally golang.org/x/tools was written assuming that (undocumented) behavior instead of restricting itself to the public API promises.

The current version of golang.org/x/tools has been updated to properly handle this in CL 516917. But to use the fixed version, other projects need to update their Go module version of golang.org/x/tools to the recent one (and possibly deal with any API changes that matter to them). Since golang.org/x/tools is still a major version 0 module, you can't even blame Go's minimum version selection algorithm for this; it's never even theoretically safe to update a thing on major version, except maybe between patch levels.

Sidebar: The contributing design decision

The specific panic in go/types happened because the methods of types.*StdSizes had an implicit requirement that they not be called on a nil pointer value. When you have methods on a pointer to a type ('pointer methods'), you always have to decide how to handle a nil pointer. Sometimes you work, sometimes you explicitly check for the nil and return errors, and sometimes you let Go panic because this is outside your API (or you've chosen an API without error returns on everything). This decision isn't necessarily well documented.

(By Hyrum's law, if your API doesn't document anything either way and works on nil pointer values in some version, changing it so that it does panic in a later version is at least an implicit API change. It's probably not much of an API change in practice if the results for a nil pointer value were unusable garbage.)

GoAPIStabilityAndAssumptions written at 23:30:42; Add Comment

2023-11-19

Third party Emacs packages that I use (as of November 2023)

My current Emacs configuration seems to have more or less settled down, so much like I do with Firefox, I want to write down my current third party packages and what I feel about them, so that I can come back to it later and see how things have changed over time. This is in no particular order except perhaps partly historical.

Currently I'm using Emacs 29.1 everywhere, which means I have some things built in (although I'm not using Emacs 29.1's tree-sitter stuff). I'm also not listing third party dependencies of these packages that I don't use directly (these are generally installed automatically through Emacs's package system).

My current third party packages are:

  • Magit for creating basically all of my Git commits. I mostly don't use Magit for other Git operations, but I consider it essential for easy and flexible Git commits (for example, selective commits). I'll sometimes start Emacs purely to make Git commits with Magit.
  • git-timemachine to let me step through historical versions of Git-controlled files in Emacs.

  • For Go programming I have go-mode, supplemented with golint and govet. Now that I look at the current state of those two latter packages, probably I should remove them and replace them with something else.

    (I'm out of date on the state of Go linters in general.)

  • lsp-mode and lsp-ui for programming language intelligence. My lsp-mode setup uses flycheck for checking instead of flymake, and company for 'as you type' autocompletion. I've done relatively extensive tuning of company's keybindings to make them less obnoxious to me. I've toyed enough with Eglot to be convinced that I don't want to switch to it.

    (Although I could use company outside of lsp-mode, I don't currently do so. I have an old entry on some of my company autocomplete customizations.)

  • flycheck-golangci-lint to integrate the thorough golangci-lint Go linter with flycheck (and thus through to my Go editing). This is an additional flycheck backend that I have to switch to if I want to use it; it's not automatically used by gopls and my lsp-mode environment.

  • diminish to turn down the noise level of Emacs' modeline. I configure and use it through use-package so I usually don't think about it.

  • backward-forward for easy, web-browser like jumping backward to where I was when I follow a reference to something in lsp-mode. I wrote an entry about jumping backward and forward.

  • which-key, which gives me a prompt of what my next options are in multi-key sequences; I find this very useful for things I don't use regularly enough to have memorized or wired into my fingers already.

  • vundo to give me an easy way to navigate backward through Emacs' sometimes unpredictable undo stack. I know that there are more elaborate packages, like undo-tree, but vundo is quite simple and meets my desires.

  • smartparens to make it less error prone to write and edit Lisp, and some other things (I have it turned on in Python mode as an experiment). Smartparens isn't perfect for Lisp (or Python), but it's broadly better than trying to do it by hand. I don't use any key bindings for it or any of its smart commands (or its strict mode), I just let it automatically insert closing things for me. Some of its rearrangement commands might make my life easier, but life is full of Emacs things to learn.

    (One area of Lisp where smartparens falls down is single quotes, which in my Lisp are most often not paired but instead used to quote symbols. So every time I write "'thing" in Emacs Lisp I have to remove the trailing quote afterward. I'll live with it, though.)

  • expand-region is a little package to expand the Emacs region out to cover increasingly big things. I use it partly for exactly that, but also partly as a way of seeing where, for example, Emacs considers the current Lisp s-expression or defun to end; if I expand the region to the entire s-expression, I can just look.

  • orderless, vertico, and marginalia to improve my minibuffer completion. I've tuned all of these significantly so that they work the way I like. See also understanding orderless, and also forcing minibuffer completion categories, which is important to me for the best use of vertico.

  • corfu to improve the completion at point experience into something more like what vertico does for minibuffer completion. I only use it on graphical displays (ie, in X).

    In general for completion, see my understanding of completion, which covers both minibuffer completion and completion at point.

  • consult to show previews of various sorts of minibuffer completions, along with additional supporting packages consult-lsp, consult-flycheck, and consult-flyspell.

  • embark, which is in theory a great way to do all sorts of things with a few keystrokes and in practice I mostly use as a handy way to do 'reflow this region' when writing email. I have embark-consult installed as well.

  • try, a handy way to try out an Emacs package without going through the effort to add it and then remove it again.

Things I'm not really using:

  • I have evil installed but I'm not using it so far, apart from occasional experimentation; it turns out to clash with my Emacs reflexes. Keeping it around doesn't hurt and maybe someday I'll want it for something.

  • fold-this seemed potentially useful and I put together some bindings for it, but in practice I don't seem to touch it. I was planning to use it in conjunction with expand-region (as a quick way of selecting a region to fold).

    Folding feels like something that might be useful for navigating files or seeing an overview of their structure if I can figure out how to use it. But I'm not currently convinced it's the best option for this for me, instead of things like consult-imenu.

I have yaml-mode and rustic installed, although I almost never edit YAML in Emacs and don't work on Rust at all. Now that I look at the state of things, possibly I should be using plain rust-mode instead of rustic (which I have installed as a dependency of rustic).

Some of these packages are probably out of date or not ideal, since I set a number of them up some time ago.

(Most of these packages are installed from MELPA, which means I'm generally getting frequent updates on the ones under active development and more or less the latest development version. So far this hasn't been a problem.)

Sidebar: Things I tried and stepped back from

At one point I tried out origami (along with lsp-origami) enough to put together keybindings for it in my .emacs, but then I decided I didn't like it enough and commented the entire block out.

I experimented briefly with whole-line-or-region before discovering that it clashed with my Emacs reflexes (which expect traditional Emacs region behavior for C-w).

EmacsPackages-2023-11 written at 22:33:24; Add Comment

2023-11-11

Go modules and the domain expiry problem

Every programming language with aspirations of having a usable system for third party packages has some sort of a namespace problem. Today, Tony Arcieri posted something about Rust's package namespace issues, which caused me to think about Go's approach to the problem. The concise summary is that Go outsources the problem to other people by making package names be URLs. Filippo Valsorda noted that this doesn't solve the domain expiration problem, which is true.

The 'domain expiration problem' is that domains (and URLs on domains) go away and get taken over, for example because the domain registration expires (hence the name). Sometimes this happens despite the owner's best intentions; for example, a lot of .ga domains got removed earlier this year. In the Go context, this means that if you published a module with the official name of, for example, 'fred.ga/go/mypackage', and fred.ga goes away, you're now stuck and there's no good way to recover. Similar issues happen if you publish on a forge and your account goes away, gets banned, or whatever.

(Go has the 'go-meta' HTML <meta> tag to let you publish one URL but have the source actually retrieved from another URL, but this only pushes the problem back one level. You can survive a forge account problem (or just change which forge you like) because you can just change where the go-meta points, but you're still in a pickle if you lose control over the URL where you have the go-meta tag.)

The good news for the general Go ecology is that any new owner of your package's URL has limited scope for being malicious. The Go module checksum database will keep them from publishing a maliciously altered version of any current release, and in theory they can't publish a new version (with malicious code) and have it automatically picked up by current users, because existing users will stick with the current version until they specifically update (new users of the package are not so lucky). And the Go module proxy will probably keep the old versions available.

(In practice, a lot of projects use more or less automated 'dependabot' updates, so I suspect a malicious update with a tiny version number change would slide right in as long as it didn't break people's tests.)

However, that's where the good news ends because today, there's no good automated way for you to update your package or to get news out about its new name (and it has to have a new name, because names are URLs and you can't use the old URL, ie the old name). You're left to make posts in various places and hope people hear about it. If you can do one last version publication on the old URL somehow you can mark your old name (module) deprecated in go.mod, and someday you may be able to automatically forward people to a different name (ie URL) (via), but both of these require (temporary) access to the old URL (including through the cooperation of the new owner).

Let me be clear that this is a hard problem in general and no one has a good answer to it, especially since the flipside of being able to update or add notices about modules without control over their URL is that it opens up obvious possibilities for external stealing or compromise of modules. If I can somehow get the Go module proxy to put up so much as a 'this module is obsolete, use this one instead' notice for my module without control over the module's URL, someone else can too.

GoModulesAndDomainExpiry written at 23:44:07; Add Comment

2023-11-01

People do change what a particular version is of a Go module

I'll start with an illustration.

; cd /tmp
; git clone https://github.com/golangci/golangci-lint
; cd golangci-lint
; git checkout ab3c3cd6
; cd cmd/golangci-lint
; go build
[succeeds with no error]
; go clean -modcache
; GOPROXY=direct go build
[...]
verifying github.com/butuzov/ireturn@v0.2.1: checksum mismatch
        downloaded: h1:QXLHriOCzRI8VN9JPhfDcaaxg3TMFD46n1Pq6Wf5zEw=
        go.sum:     h1:w5Ks4tnfeFDZskGJ2x1GAkx5gaQV+kdU3NKNr3NEBzY=

SECURITY ERROR
This download does NOT match an earlier download recorded in go.sum.
The bits may have been replaced on the origin server, or an attacker may
have intercepted the download attempt.

For more information, see 'go help module-auth'.

(The particular Git commit is the current one; I'm specifying it because this whole situation will hopefully change in the future.)

Experienced Go developers know what is going on here; it's a variant of the half missing import. At some point the developer of the ireturn module released a v0.2.1, then changed their mind and re-released a different thing as v0.2.1. During the time in the middle (sort of), golangci-lint updated to 'ireturn@v0.2.1', saved the checksum in its go.sum, and caused the 'v0.2.1' module to be fetched through the (default) Go module proxy (possibly as part of running CI or dependabot tests), which cached it. Now anyone who fetches 'ireturn@v0.2.1' through the default Go module proxy gets one version, which is the version golangci-lint requires, and anyone who fetches the real version directly gets a different version, which the Go tooling refuses to accept.

(Or perhaps the first version of ireturn@v0.2.1 was cached in the Go module proxy before golangci-lint even noticed that it had been updated, and everything was done with and against that cached copy.)

You can say that this isn't supposed to happen (the Go Module Reference talks about how a 'version' is supposed to identify an immutable thing, emphasis mine, for example). Unfortunately we live in the real world, where it does, as we see here. Possibly the Go documentation doesn't write strongly enough that once you've released a given version you can never change what it is, even if you only released it for a day or even an hour. But even then people would likely keep on doing it.

(Note that the window in the middle can be very small. All you need is one person or automated system to fetch the first version through the Go module proxy in order to cause the Go module proxy to freeze on the first version of the release. You might have published the first version for only a few minutes, but if it's the wrong few minutes, things get stuck. This is likely counterintuitive, since we seem to have a general feeling that we can fix mistakes if we act sufficient quickly, hastily grabbing our mistake back.)

When this happens all of the results are bad. For example, the Git version of golangci-lint has been depending on a module you could only get from the cache of the Go module proxy for over ten days now, and probably no one has realized (and the Go module cache doesn't promise to cache all module versions forever). Also, the real version of v0.2.1 isn't actually being used by anyone who uses the Go module proxy; it may be released in its upstream repository, but on the module proxy it's hidden by the previous v0.2.1, and its developer may be none the wiser about this. I doubt any of the parties involved intended any of these effects, and I think that part of the issue is that these problems are hard to notice by default.

I strongly believe that one thing that would help this overall situation is if every Go project with CI periodically built itself directly against all dependency modules, bypassing both any Go module proxy and the local Go module cache. This would at least detect missing or changed dependencies (direct and indirect), and get people to resolve the situation one way or another. If you have a Go project or routinely (re)build Go things that you depend on, I suggest that you consider doing this periodically. Otherwise someday you may get an unpleasant surprise.

(To be clear, there is no automatic solution possible for this. Go has the go.sum database of module checksums and module authentication in general for very good reasons and you never want to automatically override its view of things. One way or another the projects involved need to take manual steps to resolve the situation; here that might be falling back to the prior version of the 'ireturn' module, which I believe is consistent.)

GoPeopleRedoModuleVersions written at 23:14:58; Add Comment

2023-10-21

Understanding dynamic menubar menus in GNU Emacs

Suppose, not hypothetically, that you don't just want to add a new menu to Emacs' menu bar, but that you want to dynamically determine what's in this menu. You have two options, one simple to understand and use but involving magic and one complicated but with less magic. The simple option is an easy menu with a :filter. An easy-menu filter function has a simple calling convention and a simple usage. It's called with a list of the menu entries you initially specified in easy-menu-define (or the equivalent if you did it yourself), and it returns a list of what menu entries should be included, in exactly the same format. The easy-menu menu entry format is also quite powerful and expressive, letting you do things like bind menu entries to expressions, not just functions.

(To put the conclusion first, I suggest that you stick to easy-menu for dynamic menus unless you have a compelling reason otherwise.)

The more complex but less magical way is to use Emacs' standard menu system with extended menu items. To understand what we need to do, we need to understand a number of things about Emacs menus and the menu-bar. To start with, menus are actually keymaps, and keymaps themselves are just specially formatted lists. Also, an actual keymap only describes the mapping for a single character at a time. Although you can define multi-key sequences all at once for convenience, a sequence like 'C-x 5 f' is actually three entries in three keymaps; there's an entry for C-x in the global keymap, an entry for '5' in the C-x keymap, and an entry for 'f' in the 'C-x 5' keymap. If you define a multi-key sequence and the necessary intermediate keymaps don't yet exist, Emacs creates them for you. All of this is true for the menu-bar and for menus. When you define a 'key' binding for '[menu-bar my-example]' to add a new menu-bar entry, there's a '[menu-bar]' keymap that Emacs is inserting a 'my-example' entry into, which has your 'my-example' keymap and other information.

To implement a dynamic menu with standard Emacs menus, we generally need to use the :filter property of extended menu items. However, this is confusing to understand because it's documented in terms of an individual menu entry, not a menu:

This property provides a way to compute the menu item dynamically. The property value filter-fn should be a function of one argument; when it is called, its argument will be real-binding. The function should return the binding to use instead.

A menu on the menu-bar is a keymap, which is to say its 'real-binding' is a keymap. Here is a non-dynamic starting point:

(defvar test-menu-map (make-sparse-keymap "Test"))
(define-key-after test-menu-map [entry] '("Entry" . end-of-buffer))
(define-key-after test-menu-map [disabled]
  '(menu-item "Disabled" beginning-of-buffer :enable nil))

(define-key-after (current-global-map) [menu-bar test-menu]
  (cons "Test" test-menu-map))

(This sort of follows the example in Menu Bar. The functions I'm binding are random; I have to bind something, and this way I can see visible effects from a specific menu entry.)

Since the real-binding is a keymap, a :filter function will be passed the keymap and needs to return another keymap that will describe all of the menu entries. Since keymaps are a list, we can append additional menu entries to the keymap and return that (and I'll do this in my example). And to specify the :filter property, we need to set our menu-bar 'key' using the extended menu item format, instead of the simple one. Assuming that we have a my-generate-menu function, setting up the menu looks like this:

(define-key-after (current-global-map) [menu-bar test-menu]
  (list 'menu-item "Test" test-menu-map
        :filter 'my-generate-menu))

Now we get into a bit more work, because our my-generate-menu function must return a keymap that has entries in the internal keymap formats for menu entries, as covered in the format of keymaps, and this is not quite the format you give to define-key. If we dump our test-menu-map to see how the entries actually look, we will get this slightly transformed version:

(keymap "Test"
   (entry "Entry" . end-of-buffer)
   (disabled menu-item "Disabled" beginning-of-buffer :enable nil))

(The 'entry' and 'disabled' are the key name symbols we gave to define-key-after.)

The menu entries our filter function will add need to be in the same format. So a functional (although hard-coded) filter function looks like this:

(defun my-generate-menu (orig-binding)
  (append orig-binding
     '((new-one menu-item "New one" forward-word)
       (new-simple "Simple one" . backward-word)
       (new-disabled menu-item "Disabled one" next-line :enable nil))))

There are a variety of options for how your filtering function could work. If you want to make an entirely dynamic menu, you could probably have nothing in the initial test-menu-map keymap, entirely ignore the orig-binding argument to the filter function, and just create a new keymap and define menu items in it in your filter function (then return it). This would save you from getting the formatting details right for each type of keymap entry; define-key or define-key-after would worry about that for you.

(Or you could completely create your keymap by hand, since it's just a list and Emacs Lisp has plenty of options for creating lists.)

If you want to put things at the start of the menu while preserving the fixed entries at the end, things get trickier because the lists that are keymaps must start with the 'keymap' symbol. You'll need to add your entries after that symbol but before any existing bindings, or build your own keymap list. As we see above, appending entries is (somewhat) easier.

(Easy-menu's menu :filter is implemented using the standard Emacs menu item :filter, but it does various transformations in the process.)

Having gone through the entire exercise of working out how to do this with standard Emacs menu facilities, my considered opinion is that I'm going to stick with easy-menu for dynamic menus (and for non-dynamic ones too). Easy-menu has easier to use dynamic filtering and easier to use menu entries, and I'm not going to bet against its overall efficiency either. I think that easy-menu's ability to bind menu entries to expressions instead of just functions is especially useful for dynamic menus; in the single dynamic menu I built, I wanted to build a a bunch of entries of the form 'do fixed thing to <this>'. In the standard Emacs menu facilities, I'd have been building a lot of lambdas. In easy-menu, easy-menu did it for me and I think with more efficiency.

(I wound up digging into this once I learned enough to understand that easy-menu's :filter must be converting menu items from the easy-menu format to the true keymap format, which made me wonder if directly using Emacs's menu item :filter property would be more efficient. Now hopefully I can stop poking into Emacs corners.)

PS: You (I) may someday find yourself wanting to use some of the keymap-* functions on menus and menu entries. These functions take key names in string form, not in the '[menu-bar test-menu]' form that define-key does. In string form, this is written as "<menu-bar> <test-menu>", because internally the menu and menu-bar names are treated as (pretend) function keys (and Emacs represents function keys as symbols, cf). You can see this by evaluating, for example, '(kbd "<menu-bar> <file>")'.

Emacs is sometimes weirdness all the way down, partly because it has a very long history.

Sidebar: A function that reports its (menu) invocation

If you're testing standard Emacs menus, where you can only bind a function (instead of an expression), you may want an interactive function that reports how it was invoked, so you can bind it to lots of menu entries and still get useful feedback. There are probably several ways to get this, but here is what I came up with:

(defun cks/report-path ()
  (interactive)
  (message "Invoked via: %s" (this-command-keys)))

(In my examples above I bound a random set of movement functions.)

EmacsDynamicMenubarMenus written at 23:00:40; Add Comment

2023-10-20

Changing the menu bar order of Emacs easy-menu menus

These days, Emacs has a menu bar, theoretically in both graphical and text modes (I turn it off in terminals). One of the things that you (I) can want to do as part of customizing things like MH-E is to add additional menu-bar menus with your own convenient entries. The easiest and most obvious way to define a new menu-bar menu in Emacs Lisp is with Easy Menu's easy-menu-define. Easy-menu-define is both easy to use and powerful, offering support for things like dynamic filtering and dynamic enabling of entire menus. However, for my purposes it has one limitation (I shouldn't call it a flaw), namely that easy-menu-define adds new menus to the front of the menu bar (either a mode specific menu bar or worse the global part of the menu bar). The cheerful advice is to define your easy menus in reverse order, but you can't really do this if you're extending an existing mode.

There are two ways around this; the bad way of Emacs crimes and the proper way, which has worked for me so far. Both ways start with the fact that menus are actually Emacs keymaps, especially including menus in the menu bar, which is itself tied up in keymaps; you add a menu to the menu bar by adding it to either the global keymap or the current major mode keymap under a special format of key names. The reason that easy-menu-define adds your menu to the front of the menu bar is that it winds up using define-key, and define-key adds the new key binding to the front of the keymap. If we want our new menu to be anywhere else in the menu bar, we need to get the easy-menu system to use define-key-after instead (either with or without an explicit thing to put our new menu after).

If we ignore defining a function that you can use to make a pop-up menu, what easy-menu-define does (more or less) is it creates a menu with easy-menu-create-menu, creates binding(s) from this menu with easy-menu-binding, and then sets the binding(s) into the keymaps you requested with define-key (making up a special 'key' name for the binding of the special form '[menu-bar <something>]', which the menu bar system will use to find all of the menu-bar menu entries). We can do this ourselves. First let's do the two easy-menu steps:

(defun cks/easy-menu-setup (menu-name menu-items)  
  (easy-menu-binding
   (easy-menu-create-menu menu-name menu-items)
   menu-name))

Here, menu-name is the user-friendly name of your menu, which with easy-menu-define you'd put as the first element of its menu argument, and menu-items are the elements of the menu, everything except the first element of easy-menu-define's menu argument. This function does everything short of defining the menu-bar 'key'.

(easy-menu-create-menu will sometimes return a keymap and sometimes return something else I don't fully understand, depending on whether you gave the menu any properties. easy-menu-binding handles everything.)

Provided with this function, we can define menus and put them into keymaps to make them appear, like so:

(define-key-after mh-folder-mode-map [menu-bar my-example]
 (cks/easy-menu-setup "Example"
  '(["First entry" (message "First")]
    ["Second entry" (message "Second")])))

(This is not proper Emacs Lisp indentation.)

This will put your new 'Example' menu at the end of the mode specific menus in MH-E's folder window. If you want, you can save the value returned from cks/easy-menu-setup in a let variable and use define-key-after to set it in multiple modes, for example to also set it in mh-show-mode-map (there are cautions here in the case of MH-E that are outside the scope of this entry, and also general cautions in that I'm not sure that reusing the same easy-menu-binding in multiple keymaps is correct, although it works for me).

The normal easy-menu-define code will make up the special key name from the title text of your menu, which may not be what you want. Since we're doing this by hand, we can be different. Note that this may affect your ability to use other easy-menu functions to modify the menu later (for example, easy-menu-add-item). I haven't tested this.

The necessary disclaimer is that while this works for me so far, I'm not sure it's either completely correct or the best way to do this. And it would be nice if there were general functions to shuffle the order of menu bar entries.

PS: What definitely doesn't work, although you might innocently think that it should, is extracting a menu-bar entry's keymap with '(lookup-key map [menu-bar your-name])', using define-key to remove it from the keymap, and adding it back with define-key-after. This will appear to work for simple easy-menu-define menus, but won't for menus with things like a :filter; you appear to get the post-filtered version of the menu and then things obviously go wrong.

Sidebar: The Emacs crimes way

The Emacs crimes way is to use advice-add to temporarily and conditionally turn define-key into define-key-after, because easy-menu-define calls define-key only once (well, once per map). This allows us to keep using all of the features of easy-define-menu, and looks like this:

(defvar cks/define-key-to-after nil)
(defun cks/define-key-to-after (oldfun keymap key def &optional remove)
  (if (and cks/define-key-to-after (not remove))
      (define-key-after keymap key def)
    (apply oldfun keymap key def remove)))

(advice-add 'define-key :around 'cks/define-key-to-after)

(let ((cks/define-key-to-after t))
  (easy-menu-define ... ))

(advice-remove 'define-key 'cks/define-key-to-after)

My view is that this is definitely full bore Emacs crimes, but seasoned Emacs Lisp people may have different views.

A more elaborate version that allows you to optionally specify what to put the binding after (by setting a non-t value for cks/define-key-to-after) is left as an exercise to the reader.

Sidebar: What I think easy-menu-create-menu is returning

Since I started at Lisp code and the output of '(pp ...)' for long enough, what I think easy-menu-create-menu is doing is that it returns either a keymap or a keymap and a set of properties. If it only has a keymap to return, it returns just the keymap. If it has two things to return, it returns an uninterned symbol that has the keymap attached as the 'function' value of the symbol and the menu properties attached as the 'menu-prop' property of the symbol. Easy-menu-binding detects if it's the second case and peels the two parts apart again, then reassembles them differently into something that can be passed to define-key to define a menu.

Easy-menu-create-menu uses extended menu item format, including especially for the top level item that represents your entire menu (and which may have, eg, your :filter on it).

(This entire sidebar may not make sense to future me, but at least I tried.)

EmacsEasyMenuAndMenubarOrder written at 23:22:56; Add Comment

2023-10-17

(Minibuffer) completion categories in GNU Emacs and forcing them

Minibuffer completion is the most common type of completion in Emacs and it's invoked in all sorts of situations and thus to complete all sorts of different things. As part of this, Emacs completion has the concept of a completion category that can be used to customize aspects of completion in both basic Emacs (eg, also) and in and for third party packages like vertico (eg) and orderless. My personal experience is that this customization can be very useful to make me happy with third party packages; the default vertico experience is not what I want in some types of frequently used completions.

(Vertico can customize things on a per-command basis, but this can get tedious if you have a bunch of commands that all complete the same sort of thing and you want to adjust in the same way. And you can't adjust Emacs completion styles on a per-command basis in stock Emacs.)

In Emacs Lisp code you may write, the most straightforward way to do minibuffer completion is using completing-read with a big list of all of your completion choices. Often this is the most useful form as well, partly because it allows extensions like orderless to act at their most powerful, with a full view of all possible completion candidates. Unfortunately, when you invoke completing-read this way, as far as I know there is no normal way to provide a completion category. You can only provide a completion category through programmed completion, where you provide a completion function instead of a collection of choices and one of the things your completion function does is return completion metadata, including the category.

If we want to force the completion category anyway, the way I've found to do this (researched from marginalia) is to hook into the internals of the completion functions with advice-add. Specifically we need to hook into completion-metadata-get, which is what completion uses to extract a particular metadata property from a (nominal) blob of metadata:

(defvar cks/completion-category nil "Forced completion category.")
(defun cks/completion-force-category (metadata prop)
  (if (and cks/completion-category (eq prop 'category))
      cks/completion-category))

(advice-add 'completion-metadata-get :before-until
            'cks/completion-force-category)

;; used this way:
(defun cks/some-completion (msg)
  (let ((cks/completion-category 'my-special-category))
    (completing-read msg list-of-stuff ...)))

(I'm not sure where I picked up '<name>/' symbol name prefixes as a way to avoid name conflicts, or if it's the accepted Emacs style these days.)

As is traditional, we dynamically set the value we want for our (forced) completion category to something and then invoke completing-read. The dynamically scoped value will pass through to our added advice and be returned as the category value (overriding any category that's already in the metadata, because that's easier to code).

Possibly there's already a standard Emacs Lisp way of providing or setting the category of a basic completing-read. If not, my personal view is that there should be (and maybe there will be someday). Completion categories are neat and useful, so it should be easy to use them. In the mean time, well, this approach works for me in Emacs 29.1.

Sometimes you may have a more sophisticated completion environment where there's already a special completion function and some existing elisp code that calls it, but the special completion function doesn't implement providing metadata to the completion system. In that case you can advice-add the special completion function, which is simpler and normally doesn't need a special variable:

(defun cks/mh-folder-complete-note (name predicate flag)
  (if (eq flag 'metadata)
      '(metadata (category . mh-e-folder))
    nil))

(advice-add 'mh-folder-completion-function :before-until
            'cks/mh-folder-complete-note)

(You don't need to explicitly return nil, but this particular bit of code was written before I had internalized some bits of standard Lisp behavior.)

PS: In real usage these functions should have docstrings and perhaps comments, but I've omitted them for space reasons.

PPS: Since I looked up the code, a possible alternate approach would be to advice-add the completion-metadata function, which is what completion calls to obtain the metadata from a completion function in programmed completion. Getting the format right is up to you; see the Lisp code for completion-metadata.

Sidebar: marginalia and its category overrides

In theory marginalia has its own system for setting completion categories based on various things, including the current command. 'Command' here is Emacs jargon for the interactive function that was directly invoked either through a key binding or through M-x (and specifically marginalia bases this on the value of this-command, which is probably obvious to experienced Emacs people). Unfortunately, I was unable to get this marginalia feature to affect the completion category as recognized by vertico, so I eventually resorted to brute force (which did work with vertico).

Also, this way I don't have to maintain and update a list of all of my commands that call my core completion function. I can just modify the core completion function itself and automatically cover any future use of it I add as I think of more possibilities.

EmacsCompletionForcingCategories written at 22:36:53; Add Comment

2023-10-09

My understanding of various sorts of completion in GNU Emacs

One of the things that happens when you (I) only touch your Emacs configuration every few years is that you forget exactly how things work, especially if you didn't fully understand them when you were copying directions from elsewhere when you set things up. Due to recent events I've been doing a lot of GNU Emacs stuff, which has involved both recovering old understanding and learning new things about completions. Before I forget it all again, I'm writing it down for my future use.

GNU Emacs broadly has two built-in forms of completion. The one people use routinely is minibuffer completion. Because it's so common, there are a variety of built-in and third party things to change and improve it, such as fido-vertical-mode and vertico to always show some of the completion targets and marginalia to add more information about them. Orderless technically affects more than minibuffer completion, but in practice it can be hard to use it outside of the minibuffer.

The second form is on-demand completion in buffers. The dominant form of this is completion at point ('point' is the Emacs term for where the cursor is), normally invoked through M-TAB, but standard Emacs also has, for example, dabbrev-expand (bound to M-/), which tries to complete things through another mechanism. The default completion at point behavior has some aspects that are like minibuffer completion, but it's more minimal. The corfu package augments M-TAB completion at point to always show (some of) the completion targets.

(Because dabbrev-expand is not doing its completion as 'completion at point' completion, corfu's UI doesn't appear for it. The whole thing is part of Abbrevs, also, and also completions in general. Defined abbrevs can be expanded as you type, instead of on demand.)

Emacs has an entire ecosystem for generating the completions for on-demand buffer completion at point. The data for this can come from all sorts of sources, depending on what's in the buffer. In particular, if you're editing something with a LSP server active (through eg lsp-mode or eglot (also), which is part of Emacs 29.1), then information from the language server will be used to provide completion at point data, so you can use M-TAB to complete things on demand, possibly with corfu providing a popup list and so on.

As far as I know, nothing in stock GNU Emacs provides general IDE like, 'as you type' autocompletion (and this is not normally provided by LSP modes). To get this, you need to use a package like company(-mode) (also). Company can draw from multiple completion sources, but in modern use it normally primarily draws from the same 'completion at point' information than M-TAB uses, which means that it draws completion information from a LSP mode if you have that active. You can use company autocompletion independently from a LSP mode; for example, you can enable it in Emacs Lisp buffers (where there's no LSP for elisp). Since company is doing its actual completion outside of the completion at point system, it has its own popup UI of completion targets and information about them, which is independent of standard completion at point enhancements like corfu. Company has a command to explicitly start a company completion, which is potentially useful to bind to eg M-/ so that you can restart a completion that you exited without having to delete and retype some characters. Or you can type M-TAB to use Emacs' regular completion at point UI for this (including, eg, corfu's UI), unless you rebound M-TAB to company's completion (which you might, to avoid confusion).

(You can use company without as you type autocompletion and instead bind company-complete to M-TAB and M-/, if you globally set 'company-begin-commands' to 'nil' (it also works as a buffer local variable). I believe that this will still use the company UI, not the standard completion at point or corfu UI. To use company completion, the buffer must be in company-mode, but you can disable as you type autocomplete with a buffer local value for 'company-begin-commands'.)

This gives us (me) three different completion environments with three different sets of completion customizations. There's minibuffer completions (vertico, marginalia, and orderless), Emacs native completion at point (corfu), and finally company as you type (auto)completion (well, that's the normal setup for company). Depending on your usage, you may normally use only one of the last two; for example, until recently I made basically no use of M-TAB completion at point.

To summarize the common situation in LSP modes for me, company-mode is providing as you type autocompletion stacked on top of Emacs' general completion at point infrastructure, with the LSP mode (and LSP server) providing the primary completion data. In non LSP modes, like Emacs Lisp or C, I could enable company-mode and it would most likely be drawing completion data from the language mode. Even if I don't enable company-mode, I can still access the same completions through standard M-TAB completion at point (either with or without a LSP).

My personal experience has been that programming languages with large, flat namespaces of identifiers with short names don't necessarily go very well with as you type autocompletion. There are generally so many options so you get prompted all of the time (which is at best distracting and at worst routinely obscures code that you want to see), and many of them aren't very useful (or are actively wrong). Possibly this would be improved by increasing the number of characters before company starts offering autocomplete options, but so far it's been simpler to only use company in modes where I already use LSP.

(For instance, with shell scripts and LSP-mode, company will sometimes offer you autocompletion that includes every program in your $PATH. This is technically correct but often not particularly useful. Unfortunately disabling company within specific language lsp-modes seems rather difficult and I haven't been successful so far.)

PS: My understanding is that the Emacs situation used to be much less unified, especially for company, which in old days required its own specialized connections to programming language modes (eg company-go) and LSP modes (eg company-lsp). Modern Emacs has unified all of this around the completion at point infrastructure (often using the acronym 'capf', which is short for 'completion at point functions', which are backend functions that provide completion information; see eg). I'm not sure what Emacs version is 'modern' here, but probably you want to be using a recent Emacs anyway.

PPS: This can make it perfectly sensible to have a whole collection of third party packages (especially small, focused ones) that affect the different sorts of completion. If you count company, I'm currently up to five: vertico, marginalia, orderless, corfu, and company itself. There are undoubtedly more that I could add; suggestions are welcome.

EmacsUnderstandingCompletion written at 21:53:16; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.