Wandering Thoughts

2024-04-14

(Probably) forcing Git to never prompt for authentication

My major use of Git is to keep copies of the public repositories of various projects from various people. Every so often, one of the people involved gets sufficiently irritated with the life of being an open source maintainer and takes their project's repository private (or their current Git 'forge' host does it for other reasons). When this happens, on my next 'git pull', Git skids to a halt with:

; git pull
Username for 'https://gitlab.com':

This is not a useful authentication prompt for me. I have no special access to these repositories; if anonymous access doesn't work, there is nothing I can enter for a username and password that will improve the situation. What I want is for Git to fail with a pull error, the same way it would if the repository URL returned a 404 or the connection to the host timed out.

(Git prompts you here because sometimes people do deal with private repositories which they have special credentials for.)

As far as I know, Git unfortunately has no configuration option or command line option that is equivalent to OpenSSH's 'batch mode' for ssh, where it will never prompt you for password challenges and will instead just fail. The closest you can come is setting core.askPass to something that generates output (such as 'echo'), in which case Git will try to authenticate with that bogus information, fail, and complain much more verbosely, which is not the same thing (among other issues, it causes the Git host to see you as trying invalid login credentials, which may have consequences).

If you're running your 'git pull' invocations from a script, as I often am, you can have the script set 'GIT_TERMINAL_PROMPT=0' (and export it into the environment). According to the documentation, this causes Git to fail rather than prompting you for anything, including authentication. It seems somewhat dangerous to set this generally in my environment, since I have no idea what else Git might someday want to prompt me about (and obviously if you need to sometimes get prompted you can't set this). Apparently this is incomplete if you fetch Git repositories over SSH, but I don't do that for public repositories that I track.

(I found this environment variable along with a lot of other discussion in this serverfault question and its answers.)

Some environments that run git behind the scenes, such as the historical 'go get' behavior, default to disabling git prompts. If you use such an environment it may have already handled this for you.

GitNeverAuthPrompts written at 23:11:21; Add Comment

2024-04-08

Don't require people to change 'source code' to configure your programs

Often, programs have build time configuration settings for features they include, paths they use, and so on. Some of the time, people suggest that the way to handle these is not through systems like 'configure' scripts (whether produced by Autoconf or some other means) but instead by having people edit their settings into things such as your Makefiles or header files ('source code' in a broad sense). As someone who has spent a bunch of time and effort building other people's software over the years, my strong opinion is that you should not do this.

The core problem of this approach is not that you require people to know the syntax of Makefiles or config.h or whatever in order to configure your software, although that's a problem too. The core problem is you're having people modify files that you will also change, for example when you release a new version of your software that has new options that you want people to be able to change or configure. When that happens, you're making every person who upgrades your software deal with merging their settings into your changes. And merging changes is hard and prone to error, especially if people haven't kept good records of what they changed (which they often won't if your configuration instructions are 'edit these files').

One of the painful lessons about maintaining systems that we've learned over the years is that you really don't want to have two people changing the same file, including the software provider and you. This is the core insight behind extremely valuable modern runtime configuration features such as 'drop-in files' (where you add or change things by putting your own files into some directory, instead of everything trying to update a common file). When you tell people to configure your program by editing a header file or a Makefile or indeed any file that you provide, you're shoving them back into this painful past. Every new release, every update they pull from your VCS, it's all going to be a source of pain for them.

A system where people maintain (or can maintain) their build time configurations entirely outside of anything you ship is far easier for people to manage. It doesn't matter exactly how this is implemented and there are mny options for relatively simple systems; you certainly don't need GNU Autoconf or even CMake.

The corollary to this is that if you absolutely insist on having people configure your software by editing files you ship, those files should be immutable by you. You should ship them in some empty state and promise never to change that, so that people building your software can copy their old versions from their old build of your software into your new release (or never get a merge conflict when they pull from your version control system repository). If your build system can't handle even this restriction, then you need to rethink it.

ConfigureNoSourceCodeChanges written at 22:16:54; Add Comment

2024-04-06

GNU Autoconf is not replaceable in any practical sense

In the wake of the XZ Utils backdoor, which involved GNU Autoconf, it's been somewhat popular to call for Autoconf to go away. Over on the Fediverse I said something about that:

Hot take: autoconf going away would be a significant net loss to OSS, perhaps as bad as the net loss of the Python 2 to Python 3 transition, and for much the same reason. There are a lot of projects out there that use autoconf/configure today and it works, and they would all have to do a bunch of work to wind up in exactly the same place ('a build system that works and has some switches and we can add our feature checks to').

(The build system can never supply all needed tests. Never.)`

Autoconf can certainly be replaced in general, either by one of the existing and more modern configuration and build systems, such as CMake, or by something new. New projects today often opt for one of the existing alternative build systems and (I believe) often find them simpler. But what can't be replaced easily is autoconf's use in existing projects, especially projects that use autoconf in non-trivial ways.

You can probably convert most projects to alternate build systems. However, much of this work will have to be done by hand, by each project that is converted, and this work (and the time it takes) won't particularly move the project forward. That means you're asking (or demanding) projects to spend their limited time to merely wind up in the same place, with a working build system. Further, some projects will still wind up running a substantial amount of their own shell code as part of the build system in order to determine and do things that are specific to the project.

(Although it may be an extreme example, you can look at the autoconf pieces that OpenZFS has in its config/ subdirectory. Pretty much all of that work would have to be done in any build system that OpenZFS used, and generally it would have to be significantly transformed to fit.)

There likely would be incremental security-related improvements even for such projects. For example, I believe many modern build systems don't expect you to ship their generated files the way that autoconf sort of expects you to ship its generated configure script (and the associated infrastructure), which was one part of what let the XZ backdoor slip files into the generated tarballs that weren't in their repository. But this is not a particularly gigantic improvement, and as mentioned it requires projects to do work to get it, possibly a lot of work.

You also can't simplify autoconf by declaring some standard checks obsolete and dropping everything to do with them. It may indeed be the case that few autoconf based programs today are actually going to cope with, for example, there being no string.h header file (cf), but that doesn't mean you can remove mentioning it from the generated header files and so on, since existing projects require those mentions to work right. The most you could do would be to make the generated 'configure' scripts simply assume a standard list of features and put them in the output those scripts generate.

(Of course it would be nice if projects using autoconf stopped making superstitious use of things like 'HAVE_STRING_H' and just assume that standard headers are present. But projects generally have more important things to spend limited time on than cleaning up header usage.)

PS: There's an entire additional discussion that we could have about whether 'supply chain security' issues such as Autoconf and release tarballs that can't be readily reproduced by third parties are even the project's problem in the first place.

AutoconfNotReplaceable written at 22:50:15; Add Comment

2024-04-03

GNU Emacs and the case of special space characters

One of the things I've had to wrestle with due to my move to reading my email with MH-E in GNU Emacs is that any number of Emacs modes involved in this like to be helpful by reformatting and annotating your email messages in various ways. Often it's not obvious to an outsider what mode (or code) is involved. For what I believe are historical reasons, a lot of MIME handling code has wound up in GNUS (also), which was originally a news reader; some of the code and variables has 'gnus' prefixes while others has 'mm' or 'mml' prefixes. In MH-E (and I believe most things that use Emacs' standard GNUS-based MIME handling), by default you will get nominally helpful things like message fontisizing and maybe highlighting of certain whitespace that the code thinks you might care about. I mostly don't want this, so I have been turning it off where I saw it and could identify the cause.

(As far as message fontisizing goes, sometimes I don't object to it but I very much object to the default behavior of hiding the characters that triggered the fontisizing. I don't want bits of message text hidden on me so that I have to reverse engineer the actual text from visual appearance changes that I may or may not notice and understand.)

Recently I was reading an email message and there was some white space in it that Emacs had given red underlines, causing me to get a bit irritated. People who are sufficiently familiar with GNU Emacs have already guessed the cause, and in fact the answer was right there in what I saw from Leah Neukirchen's suggestion of looking at (more or less) 'C-u C-x ='. What I was seeing was GNU Emacs' default handling of various special space characters.

(I was going to say that this was a non-breaking space, but it turns out not to be; instead it was U+2002, 'en space'. A true non-breaking space is U+00A0.)

As covered in How Text Is Displayed, Emacs normally displays these special characters and others with the (Emacs) nobreak-space face, which (on suitable displays) renders the character as red with a (red) underline. Since all space variants have nothing to render, you get a red underline. As covered in the documentation, you can turn this off generally or for a buffer by setting nobreak-char-display to nil, which I definitely won't be doing generally but might do for MH-E mail buffers, since my environment generally maps special space characters to a plain space if I paste them into terminals and the like.

(A full list of Emacs font faces is in Standard Faces.)

Zero-width spaces (should I ever encounter any in email or elsewhere) are apparently normally displayed using Glyphless Character Display's 'thin-space' method, along with other glyphless characters, and are Unicode U+200B. It's not clear to me if these will display with a red underline in my environment (see this emacs.stackexchange question and answers). Some testing suggests that zero width spaces may hide out without a visual marker (based on using 'C-x 8 RET' aka 'insert-char' to enter a zero-width space, a key binding which I also found out about through this exercise). At this point I am too lazy to figure out how to force zero-width spaces to be clearly visible.

PS: Other spaces known by insert-char include U+2003 (em space), U+2007 (figure space), U+2005 (four per em space), U+200A (hair space), U+3000 (ideographic space), U+205F (medium mathematical space), U+2008 (punctuation space), U+202F (narrow non-breaking space), and more. It's slightly terrifying. Most of the spaces render in the same way. I probably won't remember any of these Unicode numbers, but maybe I can remember C-u C-x = and that 'nobreak-space' as an Emacs face is an important marker.

PPS: Having gone through all of this, it's somewhat tempting to write some ELisp that will let me flip back and forth between displaying these characters in some clearly visible escaped form and displaying them 'normally' (showing as (marked) spaces and so on). That way I could normally see them very clearly, but make them unobtrusive if I had to deal with something that full of them in a harmless way. This is one of the temptations of GNU Emacs (or in general any highly programmable environment).

EmacsSpecialSpaceCharacters written at 22:58:12; Add Comment

2024-03-20

When I reimplement one of my programs, I often wind up polishing it too

Today I discovered a weird limitation of some IP address lookup stuff on the Linux machines I use (a limitation that's apparently not universal). In response to this, I rewrote the little Python program that I had previously been using for looking up IP addresses as a Go program, because I was relatively confident I could get Go to work (although it turns out I couldn't use net.LookupAddr() and had to be slightly more complicated). I could have made the Go program a basically straight port of the Python one, but as I was writing it, I couldn't resist polishing off some of the rough edges and adding missing features (some of which the Python program could have had, and some which would have been awkward to add).

This isn't even the first time this particular program has been polished as part of re-doing it; it was one of the Python programs I added things to when I moved them to Python 3 and the argparse package. That was a lesser thing than the Go port and the polishing changes were smaller, but they were still there.

This 'reimplementation leads to polishing' thing is something I've experienced before. It seems that more often than not, if I'm re-doing something I'm going to make it better (or at least what I consider better), unless I'm specifically implementing something with the goal of being essentially an exact duplicate but in a faster environment (which happened once). It doesn't have to be a reimplementation in a different language, although that certainly helps; I've re-done Python programs and shell scripts and had it lead to polishing.

One trigger for polishing is writing new documentation and code comments. In a pattern that's probably familiar to many programmers, when I find myself about to document some limitation or code issue, I'll frequently get the urge to fix it instead. Or I'll write the documentation about the imperfection, have it quietly nibble at me, and then go back to the code so I can delete that bit of the documentation after all. But some of what drives this polishing is the sheer momentum of having the code open in my editor and already changing or writing it.

Why doesn't happen when I write the program the first time? I think part of it is that I understand the problem and what I want to do better the second time around. When I'm putting together the initial quick utility, I have no experience with it and I don't necessarily know what's missing and what's awkward; I'm sort of building a 'minimum viable product' to deal with my immediate need (such as turning IP addresses into host names with validation of the result). When I come back to re-do or re-implement some or all of the program, I know both the problem and my needs better.

ReimplementationPolish written at 23:10:44; Add Comment

2024-03-08

A realization about shell pipeline steps on multi-core machines

Over on the Fediverse, I had a realization:

This is my face when I realize that on a big multi-core machine, I want to do 'sed ... | sed ... | sed ...' instead of the nominally more efficient 'sed -e ... -e ... -e ...' because sed is single-threaded and if I have several costly patterns, multiple seds will parallelize them across those multiple cores.

Even when doing on the fly shell pipelines, I've tended to reflexively use 'sed -e ... -e ...' when I had multiple separate sed transformations to do, instead of putting each transformation in its own 'sed' command. Similarly I sometimes try to cleverly merge multi-command things into one command, although usually I don't try too hard. In a world where you have enough cores (well, CPUs), this isn't necessarily the right thing to do. Most commands are single threaded and will use only one CPU, but every command in a pipeline can run on a different CPU. So splitting up a single giant 'sed' into several may reduce a single-core bottleneck and speed things up.

(Giving sed multiple expressions is especially single threaded because sed specifically promises that they're processed in order, and sometimes this matters.)

Whether this actually matters may vary a lot. In my case, it only made a trivial difference in the end, partly because only one of my sed patterns was CPU-intensive (but that pattern alone made sed use all the CPU it could get and made it the bottleneck in the entire pipeline). In some cases adding more commands may add more in overhead than it saves from parallelism. There are no universal answers.

One of my lessons learned from this is that if I'm on a machine with plenty of cores and doing a one-time thing, it probably isn't worth my while to carefully optimize how many processes are being run as I evolve the pipeline. I might as well jam more pipeline steps whenever and wherever they're convenient. If it's easy to move one step closer to the goal with one more pipeline step, do it. Even if it doesn't help, it probably won't hurt very much.

Another lesson learned is that I might want to look for single threaded choke points if I've got a long-running shell pipeline. These are generally relatively easy to spot; just run 'top' and look for what's using up all of one CPU (on Linux, this is 100% CPU time). Sometimes this will be as easy to split as 'sed' was, and other times I may need to be more creative (for example, if zcat is hitting CPU limits, maybe pigz can help a bit.

(If I have the fast disk space, possibly un-compressing the files in place in parallel will work. This comes up in system administration work more than you'd think, since we can want to search and process log files and they're often stored compressed.)

ShellPipelineStepsAndCPUs written at 22:27:42; Add Comment

2024-02-26

How to make your GNU Emacs commands 'relevant' for M-X

Today I learned about the M-X command (well, key binding) (via), which "[queries the] user for a command relevant to the current mode, and then execute it". In other words it's like M-x but it restricts what commands it offers to relevant ones. What is 'relevant' here? To quote the docstring:

[...] This includes commands that have been marked as being specially designed for the current major mode (and enabled minor modes), as well as commands bound in the active local key maps.

If you're someone like me who has written some Lisp commands to customize your experience in a major mode like MH-E, you might wonder how you mark your personal Lisp commands as 'specially designed' for the relevant major mode.

In modern Emacs, the answer is that this is an extended part of '(interactive ...)', the normal Lisp form you use to mark your Lisp functions as commands (things which will be offered in M-x and can be run interactively). As mentioned in the Emacs Lisp manual section Using interactive, 'interactive' takes additional arguments to label what modes your command is 'specially designed' for; more discussion is in Specifying Modes For Commands. The basic usage is, say, '(interactive "P" mh-folder-mode)'

If your commands already take arguments, life is simple and you can just put the modes on the end. But not all commands do (especially for quick little things you do for yourself). If you have just '(interactive)', the correct change is to make it '(interactive nil mh-folder-mode)'; a nil first argument is how you tell interactive that there is no argument.

(Don't make my initial mistake and assume that '(interactive "" mh-folder-mode)' will work. That produced a variety of undesirable results.)

Is it useful to do this, assuming you have personal commands that are truly specific to a given mode (as I do for commands that operate on MH messages and the MH folder display)? My views so far are a decided maybe in my environment.

First, you don't need to do this if your commands have keybindings in your major mode, because M-X (execute-extended-command-for-buffer) will already offer any commands that have keybindings. Second, my assortment of packages already gives me quite a lot of selection power to narrow in on likely commands in plain M-x, provided that I've named them sensibly. The combination of vertico, marginalia, and orderless let me search for commands by substrings, easily see a number of my options, and also see part of their descriptions. So if I know I want something to do with MH forwarding I can type 'M-x mh forw' and get, among other things, my function for forwarding in 'literal plaintext' format.

With that said, adding the mode to '(interactive)' isn't much work and it does sort of add some documentation about your intentions that your future self may find useful. And if you want a more minimal minibuffer completion experience, it may be more useful to have a good way to winnow down the selection. If you use M-X frequently and you have commands you want to be able to select in it in applicable modes without having them bound to keys, you really have no choice.

EmacsMetaXRelevantCommands written at 22:11:26; Add Comment

2024-02-24

The Go 'range over functions' proposal and user-written container types

In Go 1.22, the Go developers have made available a "range over function" experiment, as described in the Go Wiki's "Rangefunc Experiment". Recently I read a criticism of this, Richard Ulmer's Questioning Go's range-over-func Proposal (via). As I read Ulmer's article, it questions the utility of the range over func (proposed) feature based on the grounds that this isn't a significant enough improvement in standard library functions like strings.Split (which is given as an example in the "more motivation" section of the wiki article).

I'm not unsympathetic to this criticism, especially when it concerns standard library functionality. If the Go developers want to extend various parts of the standard library to support streaming their results instead of providing the results all at once, then there may well be better, lower-impact ways of doing so, such as developing a standard API approach or set of approaches for this and then using this to add new APIs. However, I think that extending the standard library into streaming APIs is by far the less important side of the "range over func" proposal (although this is what the "more motivation" section of the wiki article devotes the most space to).

Right from the beginning, one of the criticisms of Go was that it had some privileged, complex builtin types that couldn't be built using normal Go facilities, such as maps. Generics have made it mostly possible to do equivalents of these (generic) types yourself at the language level (although the Go compiler still uniquely privileges maps and other builtin types at the implementation level). However, these complex builtin types still retain some important special privileges in the language, and one of them is that they were the only types that you could write convenient 'range' based for loops.

In Go today you can write, for example, a set type or a key/value type with some complex internal storage implementation and make it work even for user-provided element types (through generics). But people using your new container types cannot write 'for elem := range set' or 'for k, v := range kvstore'. The best you can give them is an explicit push or pull based iterator based on your type (in a push iterator, you provide a callback function that is given each value; in a pull iterator, you repeatedly call some function to obtain the next value). The "range over func" proposal bridges this divide, allowing non-builtin types to be ranged over almost as easily as builtin types. You would be able to write types that let people write 'for elem := range set.Forward()' or 'for k, v := kvstore.Walk()'.

This is an issue that can't really be solved without language support. You could define a standard API for iterators and iteration (and the 'iter' package covered in the wiki article sort of is that), but it would still be more code and somewhat awkward code for people using your types to write. People are significantly attracted to what is easy to program; the more difficult it is to iterate user types compared to builtin types, the less people will do it (and the more they will use builtin types even when they aren't a good fit). If Go wants to put user (generic) types on almost the same level (in the language) as builtin types, then I feel it needs some version of a "range over func" approach.

(Of course, you may feel that Go should not prioritize putting user types on almost the same level as builtin types.)

GoRangefuncAndUserContainers written at 22:30:08; Add Comment

2024-02-14

Understanding a recent optimization to Go's reflect.TypeFor

Go's reflect.TypeFor() is a generic function that returns the reflect.Type for its type argument. It was added in Go 1.22, and its initial implementation was quite simple but still valuable, because it encapsulated a complicated bit of reflect usage. Here is that implementation:

func TypeFor[T any]() Type {
  return TypeOf((*T)(nil)).Elem()
}

How this works is that it constructs a nil pointer value of the type 'pointer to T', gets the reflect.Type of that pointer, and then uses Type.Elem() to go from the pointer's Type to the Type for T itself. This requires constructing and using this 'pointer to T' type (and its reflect.Type) even though we only what the reflect.Type of T itself. All of this is necessary for reasons to do with interface types.

Recently, reflect.TypeFor() was optimized a bit, in CL 555597, "optimize TypeFor for non-interface types". The code for this optimization is a bit tricky and I had to stare at it for a while to understand what it was doing and how it worked. Here is the new version, which starts with the new optimization and ends with the old code:

func TypeFor[T any]() Type {
  var v T
  if t := TypeOf(v); t != nil {
     return t
  }
  return TypeOf((*T)(nil)).Elem()
}

What this does is optimize for the case where you're using TypeFor() on a non-interface type, for example 'reflect.TypeFor[int64]()' (although you're more likely to use this with more complex things like struct types). When T is a non-interface type, we don't need to construct a pointer to a value of the type; we can directly obtain the Type from reflect.TypeOf. But how do we tell whether or not T is an interface type? The answer turns out to be right there in the documentation for reflect.TypeOf:

[...] If [TypeOf's argument] is a nil interface value, TypeOf returns nil.

So what the new code does is construct a zero value of type T, pass it to TypeOf(), and check what it gets back. If type T is an interface type, its zero value is a nil interface and TypeOf() will return nil; otherwise, the return value is the reflect.Type of the non-interface type T.

The reason that reflect.TypeOf returns nil for a nil interface value is because it has to. In Go, nil is only sort of typed, so if a nil interface value is passed to TypeOf(), there is effectively no type information available for it; its old interface type is lost when it was converted to 'any', also known as the empty interface. So all TypeOf() can return for such a value is the nil result of 'this effectively has no useful type information'.

Incidentally, the TypeFor() code is also another illustration of how in Go, interfaces create a difference between two sorts of nils. Consider calling 'reflect.TypeFor[*os.File]()'. Since this is a pointer type, the zero value 'v' in TypeFor() is a nil pointer. But os.File isn't an interface type, so TypeOf() won't be passed a nil interface and can return a Type, even though the underlying value in the interface that TypeOf() receives is a nil pointer.

GoReflectTypeForOptimization written at 23:12:03; Add Comment

2024-02-11

Go 1.22's go/types Alias type shows the challenge of API compatibility

Go famously promises backward compatibility to the first release of Go and pretty much delivers on that (although the tools used to build Go programs have changed). Thus, one may be a bit surprised to read the following about go/types in the Go 1.22 Release Notes:

The new Alias type represents type aliases. Previously, type aliases were not represented explicitly, so a reference to a type alias was equivalent to spelling out the aliased type, and the name of the alias was lost. [...]

Because Alias types may break existing type switches that do not know to check for them, this functionality is controlled by a GODEBUG field named gotypesalias. [...] Clients of go/types are urged to adjust their code as soon as possible to work with gotypesalias=1 to eliminate problems early.

(The bold emphasis is mine, while the italics are from the release notes. The current default is gotypesalias=0.)

A variety of things in go/types return a Type, which is an interface type that 'represents a type of Go'. Well, more specifically these things return values of type Type, and these values have various underlying concrete types. Some code using go/types and dealing with Type values can handle them purely as interfaces, but other code needs to specifically handle all of the particular types (such as Array and so on). Since Type is an interface, such code will use a type switch that is supposed to be exhaustive over all of the concrete types of Type interface values.

Now we can see the problem. When Go introduces a new concrete type that can be returned as a Type value, those previously exhaustive type switches stop being exhaustive; there's a new concrete type that they're not prepared to handle. This could cause various problems in actual code. And Go has no way of requiring type switches to be exhaustive, so such code would still build fine but malfunction at runtime.

Much like the last time we saw something like this, this change is arguably not an API break, at least in theory; Go never explicitly promised that there was a specific and limited list of go/types types that implemented Type, and so in theory Go is free to expand the list. However, as we can see from the release notes (and the current behavior of not generating these new Alias types by default), the Go authors recognize that this is in practice a compatibility break, one that they're explicitly urging people to be prepared for.

What this shows is that true long term backward compatibility is very hard, and it's especially hard in an area that is inherently evolving, like exposing information about an evolving language. Getting complete backward compatibility requires more or less everything about an exposed API to be frozen, and that generally requires the area to be extremely well understood (and often pushes towards exposing very minimal APIs, which has its own problems).

As a side note, I think that Go is handling this change quite well. They've added the type to go/types so that people can add it to their own code (which will make it require Go 1.22 or later), and also provided a way that people can test the code (by building with gotypesalias=1). At the same time no actual 'Alias' types will appear (by default) until some time in the future; I'd guess no earlier than Go 1.24, a year from now.

Go122TypesAliasAndCompatibility written at 21:31:08; Add Comment

(Previous 10 or go back to February 2024 at 2024/02/02)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.