Wandering Thoughts archives

2020-09-07

URL query parameters and how laxness creates de facto requirements on the web

One of the ways that DWiki (the code behind Wandering Thoughts) is unusual is that it strictly validates the query parameters it receives on URLs, including on HTTP GET requests for ordinary pages. If a HTTP request has unexpected and unsupported query parameters, such a GET request will normally fail. When I made this decision it seemed the cautious and conservative approach, but this caution has turned out to be a mistake on the modern web. In practice, all sorts of sites will generate versions of your URLs with all sorts of extra query parameters tacked on, give them to people, and expect them to work. If your website refuses to play along, (some) people won't get to see your content. On today's web, you need to accept (and then ignore) arbitrary query parameters on your URLs.

(Today's new query parameter is 's=NN', for various values of NN like '04' and '09'. I'm not sure what's generating these URLs, but it may be Slack.)

You might wonder how we got here, and that is a story of lax behavior (or, if you prefer, being liberal in what you accept). In the beginning, both Apache (for static web pages) and early web applications often ignored extra query parameters on URLs, at least on GET requests. I suspect that other early web servers also imitated Apache here, but I have less exposure to their behavior than Apache's. My guess is that this behavior wasn't deliberate, it was just the simplest way to implement both Apache and early web applications; you paid attention to what you cared about and didn't bother to explicitly check that nothing else was supplied.

When people noticed that this behavior was commonplace and widespread, they began using it. I believe that one of the early uses was for embedding 'where this link was shared' information for your own web analytics (cf), either based on your logs or using JavaScript embedded in the page. In the way of things, once this was common enough other people began helpfully tagging the links that were shared through them for you, which is why I began to see various 'utm_*' query parameters on inbound requests to Wandering Thoughts even though I never published such URLs. Web developers don't leave attractive nuisances alone for long, so soon enough people were sticking on extra query parameters to your URLs that were mostly for them and not so much for you. Facebook may have been one of the early pioneers here with their 'fbclid' parameter, but other websites have hopped on this particular train since then (as I saw recently with these 's=NN' parameters).

At this point, the practice of other websites and services adding random query parameters to your URLs that pass through them is so wide spread and common that accepting random query parameters is pretty much a practical requirement for any web content serving software that wants to see wide use and not be irritating to the people operating it. If, like DWiki, you stick to your guns and refuse to accept some or all of them, you will drop some amount of your incoming requests from real people, disappointing would be readers.

This practical requirement for URL handling is not documented in any specification, and it's probably not in most 'best practices' documentation. People writing new web serving systems that are tempted to be strict and safe and cautious get to learn about it the hard way.

In general, any laxness in actual implementations of a system can create a similar spiral of de facto requirements. Something that is permitted and is useful to people will be used, and then supporting that becomes a requirement. This is especially the case in a distributed system like the web, where any attempt to tighten the rules would only be initially supported by a minority of websites. These websites would be 'outvoted' by the vast majority of websites that allow the lax behavior and support it, because that's what happens when the vast majority work and the minority don't.

web/DeFactoQueryParameters written at 00:17:18; Add Comment

2020-09-06

Daniel J. Bernstein's IM2000 email proposal is not a good idea

A long time ago, Daniel J. Bernstein wrote a proposal for a new generation of Internet email he called IM2000, although it never went anywhere. Ever since then, a significant number of people have idealized it as the great white 'if only' hope of email (especially as the solution to spam), in much the same way that people idealized Sun's NeWS as the great 'if only' alternative to X11. Unfortunately, IM2000 is not actually a good idea.

The core of IM2000 is summarized by Bernstein as follows:

IM2000 is a project to design a new Internet mail infrastructure around the following concept: Mail storage is the sender's responsibility.

The first problem with this is that it doesn't remove the fundamental problem of email, which is (depending on how you phrase it) that email is an anonymous push protocol or that it lacks revocable authorization to send you things. In IM2000, random strangers on the Internet are still allowed to push to you, they just push less data than they currently do with (E)SMTP mail.

The idea that IM2000 will deal with spam rests on the idea that forcing senders to store mail is difficult for spammers. Even a decade ago this was a questionable assumption, but today it is clearly false. A great deal of serving capacity is yours for the asking (and someone's credit card) in AWS, GCP, Azure, OVH, and any number of other VPS and serverless computing places. In addition many spammers will have a relatively easy time with 'storing' their email, because their spam is already generated from templates and so in IM2000 could be generated on the fly whenever you asked for it from them. We now have a great deal of experience with web servers that generate dynamic content on demand and it's clear that they can run very efficiently and scale very well, provided that they're designed competently.

(I wrote about this a long time ago here, and things have gotten much easier for spammers since then.)

At the same time, IM2000 is catastrophic for your email privacy. People complain vociferously about 'tracking pixels' in HTML email that betray when you open and read the email from someone; well, IM2000 is one giant tracking pixel that reliably reports when and where you read that email message. IM2000 would also be a terrible email reading experience, because it's like a version of IMAP where message retrieval has random delays and sometimes fails entirely.

(As far as spam filtering your incoming IM2000 messages goes, IM2000 gives you far less up front information than you currently get with SMTP email. I wrote up this and other issues a long time ago in an entry about the technical problems of such schemes. Some of those problems are no longer really an issue more than a decade later, but some continue to be.)

At a broader 'technical choices have social impacts' level, IM2000 would create a very different experience than today's email systems if implemented faithfully, one where 'your' email was actually not yours but was mostly other people's because other people are storing it. Those other people can mostly retract individual messages by deleting them from their servers (you would still have the basic headers that are pushed to you), and they can wipe out large sections of your email by deleting entire accounts (and the sent messages associated with them), or even by going out of business or having a data loss incident. Imagine a world where an ISP getting out of the mail business means that all email that its customers have sent from their ISP email accounts over the years just goes away, from everyone's mailbox.

(If 'ISP' sounds abstract here, substitute 'Yahoo'. Or 'GMail'.)

In addition, in some potential realizations of IM2000, email would become mutable in practice (even if you weren't supposed to in theory), because once again the sender is storing the message and is in a position to alter that stored copy. Expect that capability to be used sooner or later, just as people silently revise things posted on the web (including official statements, perhaps especially including them).

Some of these social effects can be partially avoided by storing your own local copies of IM2000 messages when you read them, but there are two issues. The first is pragmatic; the more you store your own copies and the earlier you make them, the more IM2000 is SMTP in a bad disguise. The second is social; in the IM2000 world the server holds the authoritative copy of the message, not you, so if you say the message says one thing (based on your local copy) and the server operator says it says something else (or doesn't exist), the server operator likely wins unless you have very strong evidence.

In general, I think that IM2000 or anything like it would create an 'email' experience that was far more like the web, complete with the experience of link rot and cool messages changing, than today's email (where for better or worse you keep your own full record of what you received, read and reread it at your leisure, and know that it's as immutable as you want it to be). And it would still have the problem that people can push stuff in front of you, unlike the web where you usually at least have to go looking for things.

tech/IM2000NotGoodIdea written at 00:39:01; Add Comment

2020-09-05

Some notes on what the CyberPower UPS 'Powerpanel' software reports to you

For reasons beyond the scope of this entry, I recently bought a reasonably nice UPS for home usage. Me being me, I then found a Prometheus metrics exporter for it, cyberpower_exporter (and see also Mike Shoup's blog post about it), and then tinkered with it. This exporter works by talking to the daemon provided by CyberPower's Powerpanel software, instead of talking directly to the UPS, so my first port of call was to dump the raw information the daemon was providing for my UPS.

(The Powerpanel software is available as a Fedora RPM that's not too obnoxious. Per the Arch Wiki page on CyberPower UPS, you can also use Network UPS Tools (NUT). I opted to take the simpler path that theoretically should just work.)

You get status information from Powerpanel by connecting to the Unix socket /var/pwrstatd.ipc (yes I know, it should be in /run) and sending ASCII 'STATUS' followed by two newlines. You can do this by hand with nc if you feel like it:

printf 'STATUS\n\n' | nc -U /var/pwrstatd.ipc

What you get back is something like this (this is my particular UPS model, yours may vary):

STATUS
state=0
model_name=CP1500PFCLCD
firmware_num=000000000000
battery_volt=24000
input_rating_volt=120000
output_rating_watt=900000
avr_supported=yes
online_type=no
diagnostic_result=1
diagnostic_date=2020/07/31 12:34:53
power_event_result=1
power_event_date=2020/07/31 12:33:59
power_event_during=21 sec.
battery_remainingtime=5160
battery_charging=no
battery_discharging=no
ac_present=yes
boost=no
utility_volt=121000
output_volt=121000
load=8000
battery_capacity=100

The 'volt' and 'watt' numbers need to be divided by 1000 to get the units you expect from their name. The 'load' is divided by 1000 to get a percentage (or by 100000 to get it in 0.0 to 1.0 form), and is as a percentage of the output rating watts. The daemon doesn't report the current load in watts; instead you have to compute it for yourself. The battery remaining time is in seconds. The battery capacity is a percentage, but unlike load, it's expressed as a straight 0-100 number. The times are in your local timezone, not UTC, and I don't know how the UPS reports longer durations of power events (in the minutes or even more than an hour).

I suspect that the state, power_event_result, and diagnostic_result fields can take on multiple values. Based on what the CyberPower pwrstat command reports for my UPS right now, these mean a normal state, that the last power event was a blackout (a total power loss), and that the last self-test passed.

(The blackout was because I unplugged the UPS from the wall socket to make sure everything worked, which is why it was so short.)

The reported load number is somewhat untrustworthy and definitely seems to be quantized by the UPS. It's possible to observe reported loads of '0' if my home machine environment is idle enough (with the display blanked). This isn't just an artifact of the Powerpanel software, either; when I looked at the UPS's actual front panel, it reported 0 load and 0 watts being used. The front panel also reports 'VA' figures, and they didn't go to zero at these '0 load' times. However, as far as I can tell VA figures aren't reported by the Powerpanel software, and may or may not be provided to the outside world by the UPS itself.

(The NUT page for a very similar model doesn't list any VA data.)

As a consequence, you can't really use the reported load value to see how much power your overall UPS-based setup is using over time; the UPS load will under-report at times of low usage and perhaps at other times. This was a bit disappointing, but then I didn't buy the UPS to (also) be a watt-meter with a USB readout that I could grab from the computer.

(The UPS connects to my desktop via USB and is visible as a USB device, but I haven't tried to dump its USB traffic to see the truly raw data. That's a little bit too much work for my current level of curiosity.)

linux/CyberPowerPowerpanelNotes written at 01:09:54; Add Comment

2020-09-04

In practice, cool URLs change (eventually)

The idea that "cool URLs don't change" has been an article of faith for a very long time. However, at this point we have more than 20 years of experience with the web, and anyone who's been around for a significant length of time can tell you that in practice, cool URLs change all of the time (and I don't mean just minor changes like preferring HTTPS over HTTP). Over a sufficient length of time, internal site page layouts change (sometimes because URL design is hard), people move domains or hosts within a domain, and sometimes cool URLs even go away and must be resurrected, sometimes by hand (through people re-publishing and re-hosting things) and sometimes through the Wayback Machine. This decay in cool URLs is so pervasive and well recognized that we have a term for it, link rot.

(Of course, you're a good person, and your cool URLs don't change. But this is the web and we all link to each other, so it's inevitable that some other people's cool URLs that you link to will suffer from link rot.)

Despite link rot being widely recognized as very real, I think that in many way's we're in denial about it. We keep pretending (both culturally and technically) that if we wish hard enough and try hard enough (and yell at people hard enough), all important URLs will be cool URLs that are unchanging forever. But this is not the case and is never going to be the case, and it's long past time that we admitted it and started dealing with it. Whether we like it or not, it is better to deal with the world of the web as it is.

Culturally, we recite "cool URLs don't change" a lot, which makes it hard to talk about how best to evolve URLs over time, how to preserve content that you no longer want to host, and other issues like that. I don't think anyone's written a best practices document for 'so you want to stop having a web site (but people have linked to it)', never mind what a company can do to be friendly for archiving when it goes out of business or shuts down a service. And that's just scratching the surface; there's a huge conversation to be had about the web over the long term once we admit out loud that nothing is forever around here.

(The Archive Team has opinions. But there are some hard issues here; there are people who have published words on the Internet, not under CC licenses, and then decided for their own reasons that they no longer want those words on the Internet despite the fact that other people like them, linked to them a lot, and so on.)

Technically, how we design our web systems and web environments often mostly ignores the possibility of future changes in either our own cool URLs or other people's. What this means in more tangible terms is really a matter for other entries, but if you look around you can probably come up with some ideas of your own. Just look for the pain points in your own web publishing environment if either your URLs or other people's URLs changed.

(One pain point and sign of problems is that it's a thing to spider your own site to find all of the external URLs so you can check if they're still alive. Another pain point is that it can be so hard to automatically tell if a link is still there, since not all dead links either fail entirely or result in HTTP error codes. Just ask people who have links pointing to what are now parked domains.)

web/CoolUrlsChange written at 00:41:56; Add Comment

2020-09-02

Why I want something like Procmail with a dedicated mail filtering language

A couple of years ago I wrote about discovering that procmail development is basically dead and wondering out loud what I might switch to. In some comments on that entry, Aristotle Pagaltzis suggested that in an environment (such as MH) with one message per file, well, let me quote:

[...], then you can write yourself one or more programs in your favourite language that kick the mail from there to wherever you want it to end up. The entirety of the job of such code is opening and reading files and then moving them, for which any language whatsoever will do, so the only concern is how far you want to library up your mail parsing.

My reply (in another comment on that entry) was that I wanted a system where I directly wrote mail filtering rules, as is the case in procmail, not a system where I wrote filtering rules in some general purpose programming language. But I never explained why I wanted a special purpose language for this.

My reason for this is that writing mail filtering in a special purpose language removes (or rather hides) all of the plumbing that is otherwise necessary. The result may have obscure syntax (procmail certainly does), but almost everything it says is about what mail filtering is happening, not the structure of getting it to happen (both at the large scale level of opening files, parsing them, moving them around, and at the small scale level of executing or otherwise matching rules). This makes it much easier to come back later to pull out 'what is this filtering' from the system; the configuration file you read is all about that. With a general purpose programming language, coming back in six months or a year requires essentially reverse engineering your entire program, because you have to find the filtering rules in the rest of the code (and understand how they're executed).

(In theory you can avoid some of this if you write good enough documentation for your personal filtering setup. In practice it's pretty unlikely that you will, or that this documentation will be well tested enough (because you need to test documentation). An open source mail filtering system with a dedicated filtering language is much more likely to have good documentation that lets you drop right into understanding your filtering rules again.)

This is a subtle advantage of DSLs (Domain Specific Languages) in general. In a good DSL, much like with wikitext, almost everything you write is real 'content' (here, real filtering rules), and very little of it is scaffolding. A general purpose language necessarily isn't that focused on your specific problem area, and so making it focus that way requires a bunch of scaffolding. At the extreme, you wind up building your own language that's implemented in the general purpose language.

(This may be literal, with a parser and everything, or it may be in the form of a set of stylized and standard function calls or method calls you make to embody your real work.)

sysadmin/WhyMailFilteringLanguage written at 23:46:24; Add Comment

2020-09-01

Even in Go, concurrency is still not easy (with an example)

Go is famous for making concurrency easy, through good language support for goroutines. Except what Go makes easy is only one level of concurrency, the nuts and bolts level of making your code do things concurrently and communicating back and forth through channels. Making it do the right things concurrently is still up to you, and unfortunately Go doesn't currently provide a lot of standard library support for correctly implemented standard concurrency patterns.

For example, one common need is for a limited amount of concurrency; you want to do several things at once, but only so many of them. At the moment this is up to you to implement on top of goroutines, channels, and things like the sync package. This is not as easy as it looks, and quite competent people can make mistakes here. As it happens, I have an example ready to hand today.

Gops is a convenient command to list (and diagnose) Go processes that are currently running on your system. Among other things, it'll tell you which version of Go they were compiled with, which is handy if you want to see if you have out of date binaries that should be rebuilt and redeployed. One of the things gops needs to do is look at all of the Go processes on your system, which it does concurrently. However, it doesn't want to look at too many processes at once, because that can cause problems with file descriptor limits. This is a classic case of limited concurrency.

Gops implements this at the moment with code in goprocess.FindAll() that looks like this, in somewhat sketched and reduced form:

func FindAll() []P {
   pss, err := ps.Processes()
   [...]
   found := make(chan P)
   limitCh := make(chan struct{}, concurrencyProcesses)

   for _, pr := range pss {
      limitCh <- struct{}{}
      pr := pr
      go func() {
         defer func() { <-limitCh }()
         [... get a P with some error checking ...]
         found <- P
      }()
   }
   [...]

   var results []P
   for p := range found {
      results = append(results, p)
   }
   return results
}

(In the real code there's a WaitGroup for coordination, and the found channel gets closed appropriately.)

How this works is clear, and is a standard pattern (covered in eg Go 101's Channel Use Cases). We use a buffered channel to provide a limited number of tokens; sending a value into the channel implicitly takes a token (and blocks if the token supply is exhausted), while receiving a value from it puts a token back in. We take a token before we start a new goroutine, and the goroutine releases the token when it's done.

Except that this code has a bug if there are too many processes to examine. Even knowing that there is a bug in this code, it may not be obvious.

The bug is that the goroutines only receive from limitCh to release their token after sending their result to the unbuffered found channel, while the main code only starts receiving from found after running through the entire loop, and the main code takes the token in the loop and blocks if no tokens are available. So if you have too many processes to go through, you start N goroutines, they all block trying to write to found and don't receive from limitCh, and the main for loop blocks trying to send to limitCh and never reaches the point where it starts receiving from found.

At one level, this bug is a very fragile bug; it only exists because of multiple circumstances. If the goroutines took the token by sending to limitCh instead of the main for loop doing it, the bug would not exist; the main for loop would start them all, many would stop, and then it would go on to receive from found so that they could receive from limitCh and release their token so other goroutines would run. If the goroutines received from limitCh to release their token before sending to found, it wouldn't exist (but because of error handling, it's simpler and more reliable to do the receive in a defer). And if the entire for loop was in an additional goroutine, the main code would go on to receive from found and unblock completed goroutines to release their tokens, so the fact that the for loop was blocked waiting to send to limitCh wouldn't matter.

At another level, this shows how concurrency is not easy as easy as it looks in Go. All you need is one mistake and things skid to a halt, and all of the code involved can look good to a casual examination. Getting concurrency correct is simply hard for people (we can debate about why, but I think that it is is very clear).

(I'm sure that the people who wrote and approved the change that added this concurrency limiting code to gops were good programmers. A tricky case still tripped them up, passing all of their scrutiny. Even when I knew that there was a concurrency problem in the code and where it was (because my gops was hanging all of a sudden, and Delve told me where everything was stuck), it still took me some time to see what the exact problem was.)

programming/GoConcurrencyStillNotEasy written at 23:57:46; Add Comment

2020-08-31

Why we won't like it if signing email is the solution to various email problems

Yesterday I wrote about my thesis that all forms of signing email are generally solving the wrong problem and said in passing that if signing email was actually a solution, we wouldn't like it in the long run. Today, let's talk about that.

As I sort of discussed yesterday, the issue with signing email as a solution is that on the Internet, identities normally can't be used to exclude people because people can always get a new one (eg, a new domain and new DKIM keys for it and so on). If signed email is going to solve problems, the requirement is that such new identities stop being useful. In other words, email providers would stop accepting email from new identities (or at least do something akin to that). If new identities don't get your email accepted, existing identities are suddenly important and can be used to revoke access.

(This revocation might be general or specific, where a user could say 'I don't want to see this place's email any more' and then the system uses the identity information to make that reliable.)

Let's be blunt: big email providers would love this. Google would be quite happy in a world where almost everyone used one of a few sources of email and Google could make deals or strongarm most or all of them. Such a world would significantly strengthen the current large incumbents and drive more business to their paid offerings. Even the current world where it's rather easier in practice to get your email delivered reliably if you're a Google Mail or Microsoft Office365 customer does that; a world where only a few identities had their email reliably accepted would make that far worse.

For the rest of us, that would be a pretty disastrous change. I won't say that the cure would be worse than the disease (people's opinions here vary), but it would likely create two relatively separate email worlds, with the remaining decentralized email network not really connected to the centralized one of 'only known identities accepted here' email. If running your own mail server infrastructure meant not talking to GMail, a lot of people and organizations would drop out of doing it and the remaining ones would likely have ideological reasons for continuing to do so.

(A far out version of this would be for it to lead to multiple federated email networks, as clusters of email systems that interact with each other but don't accept much email from the outside world effectively close their borders much as the big providers did. If this sounds strange, well, there are multiple IRC networks and even the Fediverse is splintering in practice as not everyone talks to everyone else. And there are plenty of messaging systems that don't interconnect with each other at all.)

PS: There are lesser versions of this, where large email providers don't outright stop showing 'outside' email to people but they do downgrade and segregate it. And of course that happens to some degree today through opaque anti-spam and anti-junk systems; if Hotmail dislikes your email but not enough to reject it outright, probably a lot of people there aren't going to see it.

tech/SignedEmailSolutionImpact written at 22:18:53; Add Comment

2020-08-30

All forms of signing email are generally solving the wrong problem (a thesis)

Modern email is full of forms of signed email. Personally signed email is the old fashioned approach (and wrong), but modern email on the Internet is laced with things like DKIM, which have the sending system sign it to identify at least who sent it. Unfortunately, the more I think about it, the more I feel that signed email is generally solving the wrong problem (and if it's solving the right one, we won't like that solution in the long run).

A while ago I wrote about why email often isn't as good as modern protocols, which is because it's what I described as an anonymous push protocol. An anonymous push protocol necessarily enables spam since it allows anyone to send you things. Describing email as 'anonymous push' makes it sound like the anonymity is the problem, which would make various forms of signing the solution (including DKIM). But this isn't really what you care about with email and requiring email to carry some strong identification doesn't solve the problem, as we've found out with all of the spam email that has perfectly good DKIM signatures for some random new domain.

(This is a version of the two sides of identity. On the Internet people can trivially have multiple identities, so while an identity is useful to only let selected people in, it's not useful to keep someone out.)

I think that what you really care about with modern communication protocols is revocable authorization. With a pull protocol, you have this directly; you tacitly revoke authorization by stopping pulling from the place you no longer like. With a push protocol, you can still require authorization that you grant, which lets you revoke that granted authorization if you wish. The closest email comes to this is having lots of customized email addresses and carefully using a different one for each service (which Apple has recently automated for iOS people).

Obviously, requiring authorization to push things to you has a fundamental conflict with any system that's designed to let arbitrary strangers contact you without prearrangement (which is the fundamental problem of spam). Modern protocols seem to deal with this in two ways (even with revocable authorization); they have some form of gatekeeping (in the form of accounts or access), and then they evolve to provide settings that let you stop or minimize the ability of arbitrary strangers to contact you (for example, Twitter's settings around who can send you Direct Messages).

(The modern user experience of things like Twitter has also evolved to somewhat minimize the impact of strangers trying to contact you; for example, the Twitter website separates new DMs from strangers from DMs from people you've already interacted with. It's possible that email clients could learn some lessons from this, for example by splitting your inbox into 'people and places you've interacted with before' and 'new contacts from strange people'. This would make DKIM signatures and other email source identification useful, apart from the bit where senders today feel free to keep changing where they're sending from.)

PS: In this view, actions like blocking or muting people on Twitter (or the social network of your choice) is a form of revoking their tacit authorization to push things to you.

tech/SignedEmailWrongProblem written at 22:55:33; Add Comment

2020-08-29

An interesting mistake with Go's context package that I (sort of) made

Today, Dave Cheney did another Go pop quiz on Twitter, where he asked whether the following code printed -6, 0, '<nil>', or paniced:

package main
import (
    "context"
    "fmt"
)

func f(ctx context.Context) {
    context.WithValue(ctx, "foo", -6)
}

func main() {
    ctx := context.TODO()
    f(ctx)
    fmt.Println(ctx.Value("foo"))
}

I didn't answer this correctly because I focused my attention on the wrong thing.

What I focused on was the use of the "foo" string as the context key, partly because of my experience with languages like Python. To start with, the context package's documentation says:

The provided key must be comparable and should not be of type string or any other built-in type to avoid collisions between packages using context. Users of WithValue should define their own types for keys. [...]

A traditional problem in languages like Python is that two strings may compare the same without actually being the same thing, and some code really wants you to present it with the exact same thing. However, the context package doesn't require that you present it with the exact same key, just a key where the interface value of the key will compare the same.

(Because context compares interface values, both the value and the type must match; it's not enough for both values to have the same underlying concrete type, say string, and to compare identical. This is why defining your own string type is a reliable away around collisions between packages.)

So after I worked through all of this, I confidently answered that this code printed -6. The "foo" string that the value is set with is not necessarily the same "foo" string that it's retrieved with, but that doesn't matter. However, this is not the problem with the code. The actual problem is that context.WithValue() returns a new context with the value set, it doesn't change the context it's called on. Dave Cheney's code is written as if .WithValue() mutates the current context, as f() ignores that new context that .WithValue() provides and returns nothing to main(). Since the original context in main() is what .Value() is called on, it has no "foo" key and the result is actually '<nil>'.

This problem with the code is actually a quite interesting mistake, because as far as I can tell right now none of the usual Go style checkers detect it. This code passes 'go vet', it produces no complaints from errcheck because we're not ignoring an error return value, and tools like golangci-lint only complain about the use of the built-in type string as the key in .WithValue(). Nothing seems to notice that we're ignoring the critical return value from .WithValue(), which turns it into more or less a no-op.

(Now that Dave Cheney has brought this to the surface, I suspect that someone will contribute a check for it to staticcheck, which already detects the 'using a built-in type as a key' issue.)

programming/GoContextValueMistake written at 23:24:14; Add Comment

2020-08-28

My divergence from 'proper' Vim by not using and exploring features

I've read a reasonable number of Vim tutorials and introductions by now, and one of the things that stands out is how some of what I do differs from what seems to be considered 'proper' Vim. The simple way to put it is that I use less of Vim's features than the tutorials often introduce. One of the best examples is something that I do all of the time, which is reflowing paragraphs.

The official proper Vim way to reflow paragraphs (based on tutorials I've read) is gq{motion}. Often the most flexible version is gqip or gqap (where 'ip' or 'ap' select the paragraph you're in). Assuming that various things are set correctly, this will magically reflow your paragraph, much as M-q does in Emacs (a command I'm accustomed to using there).

However, for various reasons I don't use this; instead I rely on the general purpose hammer of '!' and the (relatively) standard Unix fmt command. My conditioned reflex sequence of commands for formatting the paragraph I'm writing is 'ESC { !}fmt }', and in general I'll use '!}fmt' more or less reflexively.

At one level this is somewhere between a curiosity and a deliberate choice not to learn all of Vim and try to Vim golf everything in sight (a choice that I've written about before). At another level this is kind of a weakness. As an example, in writing this entry I discovered not just that the gq command could be made to use fmt, but also discovered or re-discovered the ip and ap motion modifiers, which might be useful periodically, including in my usual paragraph reflowing.

Or perhaps not, because now that I experiment with it, using ip instead of moving to the start of the paragraph causes the cursor to jump up to the start after the paragraph is reflowed. Using an explicit { command means that I'm (relatively) conscious that I'm actively moving before I reflow, instead of having the cursor jump. If Vim was Emacs, I probably wouldn't mind, but since Vim is Vim I think I may prefer the explicitness of my current approach.

(And on character golfing, using ip or ap saves no characters in this situation. To really golf, I would need to switch to gq.)

As before, I probably shouldn't be surprised. Vim's sets of commands and motions are now really quite large, and people generally pick and choose what they use out of large sets like that. I suspect that plenty of Vim users use only various subsets of them, subsets that would strike other Vim users as annoyingly inefficient or old-fashioned.

unix/VimNotUsingFeatures written at 23:59:37; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.