2018-01-24
What the Linux rcu_nocbs
kernel argument does (and my Ryzen issues again)
It turns out that my Linux Ryzen kernel hangs appear to be a known bug or issue (Ubuntu, Fedora, kernel); more fortunately, people have found magic incantations that appear to work around the issue. Part of the magic is some kernel command line arguments, usually cited as:
rcu_nocbs=0-N processor.max_cstate=1
(where N
is the number of CPUs you have minus one.)
Magic incantations that I don't understand bug me, especially when they seem to be essential to keeping my system from hanging, so I had to go digging.
What processor.max_cstate
does is relatively straightforward.
As briefly mentioned in kernel-parameters.txt,
it limits the C-states
(also,
also)
that Linux will allow processors to go into. Limiting the CPU to
C1 at most doesn't allow for much idling and power saving; it
might be safe to go as far as C5, since
the usual additional advice is to disable C6 in the BIOS (if your
BIOS supports doing this). On the other hand, I don't know if Ryzens
do anything between C1 and C6.
The rcu_nocbs
parameter is more involved (and mysterious). To
more or less understand it, we need to start with Read-Copy-Update
(RCU) (also Wikipedia). To simplify, RCU
handles updates to shared data structures by setting up a new version
of the data structure, changing a master location to point to it
instead of the old version, and then waiting for everyone to have
passed a synchronization point where they're guaranteed to be using
the new version instead of the old version. At that point you know
the old version is unused and you can free it.
The Linux kernel's main RCU code handles the RCU algorithm for you but it doesn't know how to free up your data structures. For that it relies on RCU callbacks that you give it; when RCU determines that the old version of your data structure can be disposed of, it will invoke your callback to do this. Normally, RCU callbacks are invoked in interrupt context as part of software interrupt (softirq) handling. Various people didn't like this because softirqs preempt whatever happens to be running at the time whenever an appropriate interrupt happens, so people came up with an alternate approach of having these potentially quite time-consuming RCU callbacks handled by regularly scheduled kernel threads instead. This is said to 'offload' RCU callbacks to these threads. Each offloaded CPU gets its own set of RCU offload kernel threads, but these kernel threads can run on any CPU, not just the CPU they're offloading.
This is what rcu_nocbs
controls; it's a list of the CPUs in
your system that should have their RCU callbacks offloaded to
threads. Normally, people use it to fence off a few CPUs from the
random interruptions of softirq RCU callbacks.
(See here and here for more information and details.)
However, the rcu_nocbs=0-N
setting we're using specifies all
CPUs, so it shifts all RCU callbacks from softirq context during
interrupt handling (on whatever specific CPU involved) to kernel
threads (on any CPU). As far as I can see, this has two potentially
significant effects, given that Matt Dillon of DragonFly BSD has
reported an issue with IRETQ
that completely stops a CPU under
some circumstances.
First, our Ryzen CPUs will spend less time in interrupt handling,
possibly much less time, which may narrow any timing window required
to hit Matt Dillon's issue. Second, RCU callback processing will
continue even if a CPU stops responding to IPIs, although
I expect that a CPU not responding to IPIs is going to cause the
Linux kernel various other sorts of heartburn.
(Unfortunately, Matt Dillon's issue doesn't correspond well with the observed symptoms, where Ryzens hang under Linux not while busy but while idle. My kernel stack backtraces do suggest that at least one CPU is spinning waiting for its IPI to other CPUs to be fully acknowledged, though, so perhaps there is a related problem. Perhaps there are even several problems.)
Some early notes on using uMatrix to replace NoScript in Firefox 56
Although people have been suggesting uMatrix (Github) to me for a while, it took Aristotle Pagaltzis' plug for it as his preferred JavaScript blocker in comments on yesterday's entry to push me into giving it a serious look. First, I took it for a spin in the Firefox on my laptop and then, somewhat impulsively, I decided to try switching from NoScript (and my cookie blocking extension) to it on my home machine. Having spent a couple of hours with it, I'm sold so far and will be switching my office Firefox over to it tomorrow.
I have three main reasons for this rapid enthusiasm. First, uMatrix gives me more fine-grained control over where and when JavaScript is enabled, because I can say 'JavaScript for X is only enabled when I'm on site Y'. Second, uMatrix's cookie blocking and cookie handling works, which means that I finally have a reliably working cookie-handling addon; my old one has been unsupported and only partially functional for years. However, the single largest reason I'm so enthused is that my Firefox appears to use significantly less memory with uMatrix. Firefox is definitely using significantly less memory on startup (once it loads all of my current set of windows) and I think it hasn't been leaking memory as fast.
(Some of this may be because of configuration differences in what JavaScript I'm allowing to run, but if so that's because uMatrix lets me do fine-grained things like only run YouTube's JS on YT itself, not on random sites that want to embed YT videos.)
Since I'm using uMatrix to replace NoScript and a cookie blocking extension, I have it set to default-deny JavaScript and cookies, with permanent exemptions for only a few specific sites. Figuring out how to set this up and to configure exemptions took a bit of reading of help material and experimentation, but once I got in tune with how uMatrix's UI works, working with it is relatively problem free. Setting up permissions on the BBC website was a bit tricky because I got myself into a redirect loop with an incomplete set of JavaScript allowances, but I was able to get things set in the end. A useful trick to learn was that I could make changes persistent in the uMatrix preferences dialog (in 'My rules', you can pick 'commit'); this is handy when enabling a rule immediately redirects you off a site.
(Since I'm also using uBlock Origin, I turned off all of uMatrix's site blocklists as just being duplication.)
uMatrix's Javascript blocking is more powerful than NoScript's because in uMatrix you normally scope permissions by the site you're visiting whereas NoScript only gives you global ones. In NoScript, if you enable JavaScript for Youtube, it's enabled on every site; in uMatrix I can say 'enable Youtube's JavaScript only when I'm visiting YouTube' (and this is the default way you'll set it up). This has made me much more willing to permanently enable various bits of third party JS on specific websites.
(This wide use of scoped permissions does make it harder to get a relatively global overview of what your permissions are. Of course part of this is that you're probably going to have more rules than you would have had in NoScript; I know that I do.)
When I switched over to uMatrix, I ran into my old Twitter no-JS endless redirect problem. In NoScript, this was fixed by a magic option to ignore META redirections in <noscript> elements. uMatrix does not have such a magic option, so I wound up turning off its 'spoof <noscript> tags' setting. This turns out to have useful side effects (for example, it turns out that Stack Overflow's obnoxious red 'we're better with JS' banner is in a <noscript> element). However, some things don't work with this off, such as external links on Tumblr sites. Fortunately uMatrix lets you enable this on a per-site basis.
(I believe that uMatrix also lets you disable <noscript> spoofing on a per-site basis, so in theory I could leave it enabled everywhere and just disable it on Twitter. But since the side effects of disabling it globally seem to be mostly positive, I'm sticking with my default-off for now.)
So far using uMatrix's UI has been reasonably non-annoying. There are some fiddly bits and I'm probably not using it in the best way possible, but I can get things done and it's not too painful. I don't expect to need to use it very often, especially once I've gotten things all tuned up; if a site wants JavaScript, mostly I feed it to another browser.
(So far the most annoying bit of the transition is that my Google search settings got damaged, again. If you do things just right, Google will give you cookies so that you get a nice, functional search experience even without JS and will stick to google.com; however, the settings are quite fragile, fall off easily, and there's no obvious way to fully set yourself back to them.)