Finding metrics that are missing labels in Prometheus (for alert metrics)
One of the things you can abuse metrics for in Prometheus is to
configure different alert levels, alert destinations, and so on for
different labels within the same metric, as I wrote about back in
my entry on using group_* vector matching for database lookups. The example in that entry used two metrics
the former showing the current available space and the latter
describing the alert levels and so on we want. Once we're using
metrics this way, one of the interesting questions we could ask is
what filesystems don't have a space alert set. As it turns out, we
can answer this relatively easily.
The first step is to be precise about what we want. Here, we want
to know what '
fs' labels are missing from
fs label is missing if it's not present in
but is present in
our_zfs_avail_gb. Since we're talking about
sets of labels, answering this requires some sort of set operation.
our_zfs_minfree_gb only has unique values for the
(ie, we only ever set one alert per filesystem), then this is
our_zfs_avail_gb UNLESS ON(fs) our_zfs_minfree_gb
our_zfs_avail_gb metric generates our initial set of known
fs labels. Then we use UNLESS to subtract the set of all
labels that are present in
our_zfs_minfree_gb. We have to use
ON(fs)' because the only label we want to match on between the
two metrics is the
fs label itself.
However, this only works if
our_zfs_minfree_gb has no duplicate
fs labels. If it does (eg if different people can set their own
alerts for the same filesystem), we'd get a 'duplicate series' error
from this expression. The usual fix is to use a one to many match,
but those can't be combined with set operators
unless'. Instead we must get creative. Since all we care
about is the labels and not the values, we can use an aggregation
to give us a single series for each label on the right side of the
our_zfs_avail_gb UNLESS ON(fs) count(our_zfs_minfree_gb) by (fs)
As a side effect of what they do, all aggregation operators condense
multiple instances of a label value this way. It's very convenient
if you just want one instance of it; if you care about the resulting
value being one that exists in your underlying metrics you can use
You can obviously invert this operation to determine 'phantom' alerts,
alerts that have
fs labels that don't exist in your underlying metric.
That expression is:
count(our_zfs_minfree_gb) by (fs) UNLESS ON(fs) our_zfs_avail_gb
(Here I'm assuimg
our_zfs_minfree_gb has duplicate
if it doesn't, you get a simpler expression.)
Such phantom alerts might come about from typos, filesystems that haven't been created yet but you've pre-set alert levels for, or filesystems that have been removed since alert levels were set for them.
This general approach can be applied to any two metrics where some
label ought to be paired up across both. For instance, you could
cross-check that every
node_info_uname metric is matched by one
or more custom per-host informational metrics that your own software
is supposed to generate and expose through the node exporter's
(This entry was sparked by a prometheus-users mailing list thread that caused me to work out the specifics of how to do this.)
The problem of 'triangular' Network Address Translation
In my entry on our use of bidirectional NAT and split horizon DNS, I mentioned that we couldn't apply our bidirectional NAT translation to all of our internal traffic in the way that we can for external traffic for two reasons, an obvious one and a subtle one. The obvious reason is our current network topology, which I'm going to discuss in a sidebar below. The more interesting subtle reason is the general problem of what I'm going to call triangular NAT.
Normally when you NAT something in a firewall or a gateway, you're in a situation where the traffic in both directions passes through you. This allows you to do a straightforward NAT implementation where you only rewrite one of the pair of IP addresses involved; either you rewrite the destination address from you to the internal IP and then send the traffic to the internal IP, or you rewrite the source address from the internal IP to you and then send the traffic to the external IP.
However, this straightforward implementation breaks down if the return traffic will not flow through you when it has its original source IP. The obvious case of this is if a client machine is trying to contact a NAT'd server that is actually on its own network. It will send its initial packet to the public IP of the NAT'd machine and this packet will hit your firewall, get its destination address rewritten, and then passed to the server. However, when it replies to the packet, the server will see a destination IP on its local network and just send it directly to the client machine. The client machine will then go 'who are you?', because it's expecting the reply to come from the server's nominal public IP, not its internal one.
(Asymmetric routing can also create this situation, for instance if the machine you're talking to has multiple interfaces and a route to you that doesn't go out the firewall-traversing one.)
In general the only way to handle triangular NAT situations is to force the return traffic to flow through your firewall by always rewriting both IP addresses. Unfortunately this has side effects, the most obvious one being that the server no longer gets the IP address of who it's really talking to; as far as it's concerned, all of the connections are coming from your firewall. This is often less than desirable.
(As an additional practical issue, not all NAT implementations are very enthusiastic about doing such two-sided rewriting.)
Sidebar: Our obvious problem is network topology
At the moment, our network topology basically has three layers; there is the outside world, our perimeter firewall, our public IP subnets with various servers and firewalls, and then our internal RFC 1918 'sandbox' subnets (behind those firewalls). Our mostly virtual BINAT subnet with the public IPs of BINAT machines basically hangs off the side of our public subnets. This creates two topology problems. The first topology problem is that there's no firewall to do NAT translation between our public subnets and the BINAT subnet. The larger topology problem is that if we just put a firewall in, we'd be creating a version of the triangular NAT problem because the firewall would have to basically be a virtual one that rewrote incoming traffic out the same interface it came in on.
To make internal BINAT work, we would have to actually add a network layer. The sandbox subnet firewalls would have to live on a separate subnet from all of our other servers, and there would have to be an additional firewall between that subnet and our other public subnets that did the NAT translation for most incoming traffic. This would impose additional network hops and bottlenecks on all internal traffic that wasn't BINAT'd (right now our firewalls deliberately live on the same subnet as our main servers).
Some notes on the structure of Go binaries (primarily for ELF)
I'll start with the background. I keep around a bunch of third party
programs written in Go, and one of the things that I do periodically
is rebuild them, possibly because I've updated some of them to
their latest versions. When doing this,
it's useful to have a way to report the package that a Go binary was
built from, ideally a fast way. I have traditionally used
binstale for this, but it's not
fast. Recently I tried out
which is fast and looked like it had great promise, except that I
discovered it didn't report about all of my binaries. My attempts to
fix that resulted in various adventures but only partial success.
All of the following is mostly for ELF
format binaries, which is the binary format used on most Unixes
(except MacOS). Much of the general information applies to other
binary formats that Go supports, but the specifics will be different.
For a general introduction to ELF, you can see eg here.
Also, all of the following assumes that you haven't stripped the
Go binaries, for example by building with '
-w' or '
All Go programs have a
.note.go.buildid ELF section that has the
build ID (also).
If you read the ELF sections of a binary and it doesn't have that,
you can give up; either this isn't a Go binary or something deeply
weird is going on.
Programs built as Go modules contain an
embedded chunk of information about the modules used in building
them, including the main program; this can be printed with '
version -m <program>'. There is no official interface to extract
this information from other binaries (inside a program you can use
runtime/debug.ReadBuildInfo()), but it's
currently stored in the binary's data section as a chunk of plain
text. See version.go
for how Go itself finds and extracts this information, which is
probably going to be reasonably stable (so that newer versions of
Go can still run '
go version -m <program>' against programs built
with older versions of Go). If you can extract this information
from a binary, it's authoritative, and it should always be present
even if the binary has been stripped.
If you don't have module information (or don't want to copy
version.go's code in order to extract it), the only approach I know
to determine the package a binary was built from is to determine
the full file path of the source code where
main() is, and then
reverse engineer that to create a package name (and possibly a
module version). The general approach is:
- extract Go debug data from the binary and use debug/gosym to create a
- look up the
main.mainfunction in the table to get its starting address, and then use
Table.PCToLine()to get the file name for that starting address.
- convert the file name into a package name.
Binaries built from
$GOPATH will have file names of the form
$GOPATH/src/example.org/fred/cmd/barney/main.go. If you take the
directory name of this and take off the
$GOPATH/src part, you
have the package name this was built from. This includes module-aware
builds done in
$GOPATH. Binaries built directly from modules with
go get example.org/fred/cmd/barney@latest' will have a file path
of the form
To convert this to a module name, you have to take off '
and move the version to the end if it's not already there. For
binaries built outside some
$GOPATH, with either module-aware
builds or plain builds, you are unfortunately on your own; there
is no general way to turn their file names into package names.
(There are a number of hacks if the source is present on your local
system; for example, you can try to find out what module or VCS
repository it's part of if there's a
go.mod or VCS control directory
somewhere in its directory tree.)
However, to do this you must first extract the Go debug data from
your ELF binary. For ordinary unstripped Go binaries, this debugging
information is in the
.gosymtab ELF sections of
the binary, and can be read out with
Go binaries that use cgo do not have these Go ELF sections. As
mentioned in Building a better Go linker:
For “cgo” binaries, which may make arbitrary use of C libraries, the Go linker links all of the Go code into a single native object file and then invokes the system linker to produce the final binary.
This linkage obliterates
.gosymtab as separate
ELF sections. I believe that their data is still there in the final
binary, but I don't know how to extract them. The Go debugger Delve doesn't even try; instead, it
uses the general DWARF
.debug_line section (or its compressed version), which seems
to be more complicated to deal with. Delve has its DWARF code as
sub-packages, so perhaps you could reuse them to read and process the
DWARF debug line information to do the same thing (as far as I know
the file name information is present there too).
Since I have and use several third party cgo-based programs, this
is where I gave up. My hacked branch of the
which package can deal
with most things short of "cgo" binaries, but unfortunately that's
not enough to make it useful for me.
(Since I spent some time working through all of this, I want to write it down before I forget it.)
PS: I suspect that this situation will never improve for non-module builds, since the Go developers want everyone to move away from them. For Go module builds, there may someday be a relatively official and supported API for extracting module information from existing binaries, either in the official Go packages or in one of the golang.org/x/ additional packages.
Bidirectional NAT and split horizon DNS in our networking setup
Like many other places, we have far too many machines to give them all public IPs (or at least public IPv4 IPs), especially since they're spread across multiple groups and each group should get its own isolated subnet. Our solution is the traditional one; we use RFC 1918 IPv4 address space behind firewalls, give groups subnets within it (these days generally /16s), and put each group in what we call a sandbox. Outgoing traffic from each sandbox subnet is NAT'd so that it comes out from a gateway IP for that sandbox, or sometimes a small range of them.
However, sometimes people quite reasonably want to have some of their sandbox machines reachable from the outside world for various reasons, and also sometimes they need their machines to have unique and stable public IPs for outgoing traffic. To handle both of these cases, we use OpenBSD's support for bidirectional NAT. We have a 'BINAT subnet' in our public IP address space and each BINAT'd machine gets assigned an IP on it; as external traffic goes through our perimeter firewall, it does the necessary translation between internal addresses and external ones. Although all public BINAT IPs are on a single subnet, the internal IPs are scattered all over all of our sandbox subnets. All of this is pretty standard.
(The public BINAT subnet is mostly virtual, although not entirely so; for various peculiar reasons there are a few real machines on it.)
However, this leaves us with a DNS problem for internal machines (machines behind our perimeter firewall) and internal traffic to these BINAT'd machines. People and machines on our networks want to be able to talk to these machines using their public DNS names, but the way our networks are set up, they must use the internal IP addresses to do so; the public BINAT IP addresses don't work. Fortunately we already have a split-horizon DNS setup, because we long ago made the decision to have a private top level domain for all of our sandbox networks, so we use our existing DNS infrastructure to give BINAT'd machines different IP addresses in the internal and external views. The external view gives you the public IP, which works (only) if you come in through our perimeter firewall; the internal view gives you the internal RFC 1918 IP address, which works only inside our networks.
(In a world where new gTLDs are created like popcorn, having our own top level domain isn't necessarily a great idea, but we set this up many years before the profusion of gTLDs started. And I can hope that it will stop before someone decides to grab the one we use. Even if they do grab it, the available evidence suggests that we may not care if we can't resolve public names in it.)
Using split-horizon DNS this way does leave people (including us) with some additional problems. The first one is cached DNS answers, or in general not talking to the right DNS servers. If your machine moves between internal and external networks, it needs to somehow flush and re-resolve these names. Also, if you're on one of our internal networks and you do DNS queries to someone else's DNS server, you'll wind up with the public IPs and things won't work. This is a periodic source of problems for users, especially since one of the ways to move on or off our internal networks is to connect to our VPN or disconnect from it.
The other problem is that we need to have internal DNS for any public name that your BINAT'd machine has. This is no problem if you give your BINAT machine a name inside our subdomain, since we already run DNS for that, but if you go off to register your own domain for it (for instance, for a web site), things can get sticky, especially if you want your public DNS to be handled by someone else. We don't have any particularly great solutions for this, although there are decent ones that work in some situations.
(Also, you have to tell us what names your BINAT'd machine has. People don't always do this, probably partly because the need for it isn't necessarily obvious to them. We understand the implications of our BINAT system, but we can't expect that our users do.)
(There's both an obvious reason and a subtle reason why we can't apply BINAT translation to all internal traffic, but that's for another entry because the subtle reason is somewhat complicated.)
The mystery of why my Fedora 30 office workstation was booting fine
So the latest Fedora 30 updates (including a kernel update) build an initramfs that refuses to bring up software RAID devices, including the one that my root filesystem is on. Things do not go well afterwards.
Then I said:
Fedora's systemd, Dracut and kernel parameters setup have now silently changed to require either rd.md.uuid for your root filesystem or rd.auto. The same kernel command line booted previous kernels with previous initramfs's.
The first part of this is wrong, and that leads to the mystery.
In Fedora 29, my kernel command line was specifying both the root
filesystem device by name ('
root=/dev/md20') and the software
RAID arrays for the initramfs to bring up (as '
rd.md.uuid=...'). When I upgraded to Fedora 30
in mid-August, various things happened
and I wound up removing both of those from the kernel command line,
specifying the root filesystem device only by UUID ('
This kernel command line booted a series of Fedora 30 kernels, most
recently 5.2.11 on September 4th, right up until yesterday.
However, it shouldn't have. As the
dracut.cmdline manpage says,
the default since Dracut 024 has been to not auto-assemble software
RAID arrays in the absence of either
And the initramfs for older kernels (at least 5.2.11) was theoretically
enforcing that; the journal for that September 4th boot contains a
dracut-pre-trigger: rd.md=0: removing MD RAID activation
But then a few lines later, md/raid1:md20 is activated:
kernel: md/raid1:md20: active with 2 out of 2 mirrors
(The boot log for the new kernel for a failed boot also had the dracut-pre-trigger line, but obviously no mention of the RAID being activated.)
I unpacked the initramfs for both kernels and as far as I can tell
they're identical in terms of the kernel modules included and the
configuration files and scripts (there are differences in some
binaries, which is expected since systemd and some other things got
upgraded between September 4th and now). Nor has the kernel
configuration changed between the two kernels according to the
config-* files in
So by all evidence, the old kernel and initramfs should not auto-assemble my root filesystem's software RAID and thus shouldn't boot. But, they do. In fact they did yesterday, because when the new kernel failed to boot the first thing I did was boot with the old one. I just don't know why, and that's the mystery.
My fix for my boot issue is straightforward; I've updated my kernel
command line to have the '
rd.md.uuid=...' that it should have had
all along. This works fine.
(My initial recovery from the boot failure was to use '
but I've decided that I don't want to auto-assemble anything and
everything that the initramfs needs. I'll have the initramfs only
assemble the bare minimum, just in case. While my swap is also on
software RAID, I specifically decided to not assemble it in the
initramfs; I don't really need it until later.)
Making your own changes to things that use Go modules
Suppose, not hypothetically, that you have found a useful Go program but when you test it you discover that it has a bug that's a problem for you, and that after you dig into the bug you discover that the problem is actually in a separate package that the program uses. You would like to try to diagnose and fix the bug, at least for your own uses, which requires hacking around in that second package.
In a non-module environment, how you do this is relatively
straightforward, although not necessarily elegant. Since building
programs just uses what's found in in
$GOPATH/src, you can
directly into your local clone of the second package and start
hacking away. If you need to make a pull request, you can create a
branch, fork the repo on Github or whatever, add your new fork as
an additional remote, and then push your branch to it. If you didn't
want to contaminate your main
$GOPATH with your changes to the
upstream (since they'd be visible to everything you built that used
that package), you could work in a separate directory hierarchy and
$GOPATH when you were working on it.
If the program has been migrated to Go modules, things are not
quite as straightforward. You probably don't have a clone of the
second package in your
$GOPATH, and even if you do, any changes
to it will be ignored when you rebuild the program (if you do it
in a module-aware way). Instead, you make
local changes by using the '
replace' directive of the program's
go.mod, and in some ways it's better than the non-module approach.
First you need local clones of both packages. These clones can be
a direct clone of the upstream or they can be clones of Github (or
Gitlab or etc) forks that you've made. Then, in the program's module,
you want to change
go.mod to point the second package to your
local copy of its repo:
replace github.com/rjeczalik/which => /u/cks/src/scratch/which
You can edit this in directly (as I did when I was working on this)
or you can use '
go mod edit'.
If the second package has
not been migrated to Go modules, you need to create a
your local clone (the Go documentation will tell you this if you
read all of it).
Contrary to what I initially thought, this new
go.mod does not
need to have the module name of the package you're replacing, but
it will probably be most convenient if it does claim to be, eg,
github.com/rjeczalik/which, because this means that any commands
or tests it has that import the module will use your hacks, instead
of quietly building against the unchanged official version (again,
assuming that you build them in a module-aware way).
(You don't need a
replace line in the second package's
Go's module handling is smart enough to get this right.)
As an important note, as of Go 1.13 you must do '
go get' to
build and install commands from inside this source tree even if
$GOPATH. If it's under
$GOPATH and you do '
<blah>/cmd/gobin', Go does a non-module '
go get' even though the
directory tree has a
go.mod file and this will use the official
version of the second package, not your replacement. This is
documented but perhaps surprising.
When you're replacing with a local directory this way, you don't need to commit your changes in the VCS before building the program; in fact, I don't think you even need the directory tree to be a VCS repository. For better or worse, building the program will use the current state of your directory tree (well, both trees), whatever that is.
If you want to see what your module-based binaries were actually
built with in order to verify that they're actually using your
modified local version, the best tool for this is '
go version -m'.
This will show you something like:
go/bin/gobin go1.13 path github.com/rjeczalik/bin/cmd/gobin mod github.com/rjeczalik/bin (devel) dep github.com/rjeczalik/which v0.0.0-2014[...] => /u/cks/go/src/github.com/siebenmann/which
I believe that the '(devel)' appears if the binary was built directly
from inside a source tree, and the '=>' is showing a '
in action. If you build one of the second package's commands (from
inside its source tree), '
go version -m' doesn't report the
replacement, just that it's a '(devel)' of the module.
(Note that this output doesn't tell us anything about the version
of the second package that was actually used to build the binary,
except that it was the current state of the filesystem as of the
build. The 'v0.0.0-2014[...]' version stamp is for the original
version, not our replacement, and comes from the first package's
PS: If '
go version -m' merely reports the 'go1.13' bit, you managed
to build the program in a non module-aware way.
Sidebar: Replacing with another repo instead of a directory tree
The syntax for this uses your alternate repository, and I believe it
must have some form of version identifier. This version identifier
can be a branch, or at least it can start out as a branch in your
go.mod, so it looks like this:
replace github.com/rjeczalik/which => github.com/siebenmann/which reliable-find
After you run '
go build' or the like, the
go command will quietly
rewrite this to refer to the specific current commit on that branch.
If you push up a new version of your changes, you need to re-edit
go.mod to say '
reliable-find' or '
master' or the like
Your upstream repository doesn't have to have a
unlike the case with a local directory tree. If it does have a
go.mod, I think that the claimed package name can be relatively
liberal (for instance, I think it can be the module that you're
replacing). However, some experimentation with sticking in random
upstreams suggests that you want the final component of the module
name to match (eg, '<something>/which' in my case).
Catching Control-C and a gotcha with shell scripts
Suppose, not entirely hypothetically, that you have some sort of
spiffy program that wants to use Control-C
as a key binding to get it to take some action. In Unix, there are
two ways of catching Control-C for this sort of thing. First, you
can put the terminal into raw mode, where Control-C
becomes just another character that you read from the terminal and
you can react to it in any way you like. This is very general but
it has various drawbacks, like you have to manage the terminal state
and you have to be actively reading from the terminal so you can
notice when the key is typed. The simpler alternative way of catching
Control-C is to set a signal handler for
SIGINT and then react
when it's invoked. With a signal handler, the kernel's standard
tty input handling does all of that hard
work for you and you just get the end result in the form of an
SIGINT signal. It's quite convenient and leaves you
with a lot less code and complexity in your spiffy Control-C catching
Then some day you run your spiffy program from inside a shell script
(perhaps you wanted to add some locking), hit Control-C to signal your
program, and suddenly you have a mess (what sort of a mess depends
on whether or not your shell does job control). The problem is that
when you let the kernel handle Control-C by delivering a
signal, it doesn't just deliver it to your program; it delivers it
to the shell script and in fact any other programs that the shell
script is also running (such as a
flock command used to add
locking). The shell script and these other programs are not expecting
SIGINT signals and haven't set up anything special to
handle it, so they will get killed.
(Specifically, the kernel will send the
SIGINT to all processes
in the foreground process group.)
Since your shell was running the shell script as your command and the shell script exited, many shells will decide that your command has finished. This means they'll show you the shell prompt and start interacting with you again. This can leave your spiffy program and your shell fighting over terminal output and perhaps terminal input as well. Even if your shell and your spiffy program don't fight for input and write their output and shell prompt all over each other, generally things don't go well; for example, the rest of your shell script isn't getting run, because the shell script died.
Unfortunately there isn't a good general way around this problem.
If you can arrange it, the ideal is for the wrapper shell script
to wind up directly
exec'ing your spiffy program so there's nothing
SIGINT will be sent to (and kill). Failing that, you might
have to make the wrapper script trap and ignore
SIGINT while it's
running your program (and to make your program unconditionally
SIGINT signal handler, even if
SIGINT is ignored
when the program starts).
Speaking from painful personal experience, this is an easy issue
to overlook (and a mysterious one to diagnose). And of course
everything works when you test your spiffy program by running it
directly, because then the only process getting a
SIGINT is the
one that's prepared for it.
A safety note about using (or having)
$GOPATH in Go 1.13
One of the things in the Go 1.13 release notes is a little note
about improved support for
go.mod. This is worth quoting in
more or less full:
GO111MODULEenvironment variable continues to default to
auto, but the
autosetting now activates the module-aware mode of the go command whenever the current working directory contains, or is below a directory containing, a
go.modfile — even if the current directory is within
The important safety note is that this potentially creates a confusing situation, and also it may be easy for other people to misunderstand what this actually says in the same way that I did.
Suppose that there is a Go program that is part of a module,
example.org/fred/cmd/bar (with the module being example.org/fred).
If you do '
go get example.org/fred/cmd/bar', you're fetching and
building things in non-module mode, and you will wind up with a
$GOPATH/src/example.org/fred VCS clone, which will have a
file at its root, ie
the fact that there is a
go.mod file right there on disk, re-running
go get example.org/fred/cmd/bar' while you're in (say) your home
directory will not do a module-aware build. This is because, as the
note says, module-aware builds only happen if your current directory
or its parents contain a
go.mod file, not just if there happens
to be a
go.mod file in the package (and module) tree being built.
So the only way to do a proper module aware build is to actually
be in the command's subdirectory:
cd $GOPATH/src/example.org/fred/cmd/bar go get
(You can get very odd results if you cd to
and then attempt to '
go get example.org/fred/cmd/bar'. The result
is sort of module-aware but weird.)
This makes it rather more awkward to build or rebuild Go programs
through scripts, especially if they involve various programs that introspect your existing
Go binaries. It's also easy to slip up and de-modularize a Go binary;
one absent-minded '
go get example.org/...' will do it.
In a way, Go modules don't exist on disk unless you're in their
directory tree. If that tree is inside
$GOPATH and you're not in
it, you have a plain Go package, not a module.
(If the directory tree is outside
$GOPATH, well, you're not doing
much with it without
cd'ing into it, at which point you have a
The easiest way to see whether a binary was built module-aware or
not is '
goversion -m PROGRAM'. If
the program was built module-aware, you will get a list of all of
the modules involved. If it wasn't, you'll just get a report of
what Go version it was built with. Also, it turns out that you can
build a program with modules without it having a
GO111MODULE=on go get rsc.io/goversion@latest
The repository has tags but no
go.mod. This also works on
repositories with no tags at all. If the program uses outside
packages, they too can be non-modular, and '
goversion -m PROGRAM'
will (still) produce a report of what tags, dates, and hashes they
Update: in Go 1.13, '
go version -m PROGRAM' also reports the
module build information, with module hashes included as well.
This does mean that in theory you could switch over to building all
third party Go programs you use this way. If the program hasn't
converted to modules you get more or less the same results as today,
and if the program has converted, you get their hopefully stable
go.mod settings. You'd lose having a local copy of everything in
$GOPATH, though, which opens up some issues.
Jumping backward and forward in GNU Emacs
In my recent entry on writing Go with Emacs's lsp-mode, I noted that lsp-mode or more accurately lsp-ui has a 'peek' feature that winds up letting you jump to a definition or a reference of a thing, but I didn't know how to jump back to where you were before. The straightforward but limited answer to my question is that jumping back from a LSP peek is done with the M-, keybinding (which is surprisingly awkward to write about in text). This is not a special LSP key binding and function; instead it is a standard binding that runs xref-pop-marker-stack, which is part of GNU Emacs' standard xref package. This M-, binding is right next to the standard M-. and M-? xref bindings for jumping to definitions and references. It also works with go-mode's godef-jump function and its C-c C-j key binding.
(Lsp-ui doesn't set up any bindings for its 'peek' functions, but if you like what the 'peek' feature does in general you probably want to bind them to M-. and M-? in the lsp-ui-mode-map keybindings so that they take over from the xref versions. The xref versions still work in lsp-mode, it's just that they aren't as spiffy. This is convenient because it means that the standard xref binding 'C-x 4 .' can be used to immediately jump to a definition in another Emacs-level 'window'.)
I call this the limited answer for a couple of reasons. First, this only works in one direction; once you've jumped back, there is no general way to go forward again. You get to remember yourself what you did to jump forward and then do it again, which is easy if you jumped to a definition but not so straightforward if you jumped to a reference. Second, this isn't a general feature; it's specific to the xref package and to things that deliberately go out of their way to hook into it, which includes lsp-ui and go-mode. Because Emacs is ultimately a big ball of mud, any particular 'jump to thing' operation from any particular may or may not hook into the xref marker stack.
(A core Emacs concept is the mark, but core mark(s) are not directly tied to the xref marker stack. It's usually the case that things that use the xref marker stack will also push an entry onto the plain mark ring, but this is up to the whims of the package author. The plain mark ring is also context dependent on just what happened, with no universal 'jump back to where I was' operation. If you moved within a file you can return with C-u C-space, but if you moved to a different file you need to use C-x C-space instead. Using the wrong one gets bad results. M-, is universal in that it doesn't matter whether you moved within your current file or moved to another one, you always jump backward with the same key.)
The closest thing I've found in GNU Emacs to a browser style backwards and forwards navigation is a third party package called backward-forward (also gitlab). This specifically attempts to implement universal jumping in both directions, and it seems to work pretty well. Unfortunately its ring of navigation is global, not per (Emacs) window, but for my use this isn't fatal; I'm generally using Emacs within a single context anyway, rather than having several things at once the way I do in browsers.
Because I want browser style navigation, I've changed from the default backward-forward key bindings by removing its C-left and C-right bindings in favor of M-left and M-right (ie Alt-left and Alt-right, the standard browser key bindings for Back and Forward), and also added bindings for my mouse rocker buttons. How I have it set up so that it works on Fedora and Ubuntu 18.04 is as follows (using use-package, as everyone seems to these days):
(use-package backward-forward :demand :config (backward-forward-mode t) :bind (:map backward-forward-mode-map ("<C-left>" . nil) ("<C-right>" . nil) ("<M-left>" . backward-forward-previous-location) ("<M-right>" . backward-forward-next-location) ("<mouse-8>" . backward-forward-previous-location) ("<mouse-9>" . backward-forward-next-location) ) )
:demand is necessary on Ubuntu 18.04 to get
the key bindings to work. I don't know enough about Emacs to
PS: Normal Emacs and Lisp people would probably stack those stray )'s at the end of the last real line. One of my peculiarities in ELisp is that I don't; I would rather see a clear signal of where blocks end, rather than lose track of them in a stack of ')))'. Perhaps I will change this in time.
CentOS 7 and Python 3
Over on Twitter, I said:
Today I was unpleasantly reminded that CentOS 7 (still) doesn't ship with any version of Python 3 available. You have to add the EPEL repositories to get Python 3.6.
This came up because of a combination of two things. The first is
that we need to set up CentOS 7 to host a piece of commercial
software, because CentOS 7 is the most recent Linux release it
supports. The second is that an increasing number of our local
management tools are now in Python 3
and for various reasons, this particular CentOS 7 machine needs to
run them (or at least wants to ) when our existing CentOS 7 machines
haven't. The result was that when I set up various pieces of our
standard environment on a newly installed CentOS 7 virtual machine,
they failed to run because there was no
At one level this is easily fixed. Adding the EPEL repositories is a straightforward
yum install epel-release', and after that installing Python 3.6
yum install python36'. You don't get a
pip3 with this and
I'm not sure how to change that, but for our purposes pip3 isn't
necessary; we don't install packages system-wide through PIP under
anything except exceptional circumstances.
(The current exceptional circumstances is for Tensorflow on our GPU compute servers. These run Ubuntu 18.04, where pip3 is available more or less standard. If we had general-use CentOS 7 machines it would be an issue, because pip3 is necessary for personal installs of things like the Python LSP server.)
Even having Python 3.6 instead of 3.7 isn't particularly bad right now; our Ubuntu 16.04 machines have Python 3.5.2 and even our 18.04 ones only have 3.6.8. Even not considering CentOS 7, it will be years before we can safely move any of our code past 3.6.8, since some of our 18.04 machines will not be upgraded to 20.04 next year and will probably stay on 18.04 until early 2023 when support starts to run out. This is surprisingly close to the CentOS 7 likely end of life in mid 2024 (which is much closer than I thought before I started writing this entry), so it seems like CentOS 7 only having Python 3.6 is not going to hold our code back very much, if at all.
(Hopefully by 2023 either EPEL will have a more recent version of Python 3 available on CentOS 7 or this commercial software will finally support CentOS 8. I can't blame them for not supporting RHEL 8 just yet, since it's only been out for a relatively short length of time.)
PS: I don't know what the difference is between the
repositories you get by doing it this way and the
repositories you get from following the instructions in the EPEL
wiki. The latter repos still
don't seem to have Python 3.7, so I'm not worrying about it; I'm
not very picky about the specific version of Python 3.6 I get,
especially since our code has to run on 3.5 anyway.