Wandering Thoughts

2020-03-17

A problem I'm having with my HiDPI display, remote X, and (X) cursors

When I set up my HiDPI display on my home Linux machine, I had to do some wrestling with general DPI and scaling settings but after that most everything just worked and I didn't think about it. Due to world and local events, I spent a chunk of today getting set up for an extended period of working from home, including getting my work exmh configured to display properly over remote X on my HiDPI home display.

(Exmh is one of the things that I really miss when I don't have X across the network, and my current DSL link is actually fast enough to make it useful for reading email. If I'm going to be working from home for an extended period of time, I need a good email environment so it was worth the effort to see if exmh could run decently over my DSL link.)

This worked in general (with a few mistakes along the way), but after using exmh for a while I realized that the (X) mouse cursors that I was seeing when my mouse was over the exmh windows were unusually and suspiciously small, as if they hadn't been scaled up to HiDPI levels. At first I thought that this was a TCL/TK issue, but then I looked at the mouse cursors I was seeing in other programs run over the remote X connection (such as sam and even xterm), I saw the same issue. My local xterm windows have a mouse cursor that's the right size (roughly the size of a capital letter in the xterm), but an xterm on our Ubuntu machines run over remote X has one that's half the correct size. The same is true of the cursors in exmh, GNU Emacs, and sam.

(In the process of writing this entry, I checked my office Fedora machine and to my surprise, these programs all work correctly there over a remote X connection.)

X mouse cursors are a very old thing and in the way of X they've gone through a number of evolutions over the years (and then things like GUI toolkits and theming added extra layers of fun). The result is relatively opaque and underdocumented, especially if what you care about is basic X stuff like xterm and TCL/TK (for natural reasons, most people focus on writing about full scale desktops like GNOME and KDE). I found a variety of things on the Internet, some of which didn't work for me and some of which aren't feasible because the remote machines are multi-user ones and not everyone doing remote X to it has a HiDPI display (I won't when we go back to work, for example).

These days, there are apparently cursor themes, as discussed a bit in the Gentoo wiki and this article (and see also). Some basic X programs in some environments pay attention to this, through both X resources settings and environment variables (per the Arch wiki), but on our Ubuntu machines the various X programs seem to ignore the environment variables (although this stackoverflow answer talks about them). On Fedora the $XCURSOR_SIZE environment variable and so on does work.

Our Ubuntu machines have the libxcursor shared library installed (as 'libXcursor.so') and a running xterm uses it, but they don't seem to have any X cursor files installed (we don't have the xcursor-themes package present, for example). This may mean that our Ubuntu machines are forced to fall back to some very old X protocol thing and that X protocol thing only has one size, that being the tiny non-HiDPI cursors. My Fedora machines do appear to have cursor themes installed in stuff under /usr/share/icons, and it looks like if I copy the right one ('Adwaita') to our Ubuntu machine and set $XCURSOR_PATH and $XCURSOR_THEME, my exmh, xterm, and so on work right.

(I think that setting these environment variables in general is harmless for non-HiDPI sessions, because I believe that the X cursor library magically picks the right size based on your display DPI.)

I suspect that this is a sign that our Ubuntu machines don't really have all of the X related packages that they should have in order to make modern X programs happy in modern X environments (which definitely include HiDPI screens). I'm not sure what additional packages we need, though, which means that I have a new project. In the mean time, writing this entry has gotten me to do enough research to find a workaround for now.

HiDPIRemoteXSmallCursors written at 01:33:34; Add Comment

2020-03-09

What makes our Ubuntu updates driver program complicated

In response to yesterday's entry on how we sort of automate Ubuntu package updates, which involves a complicated driver program (written in Python) to control a bunch of ssh's to our machines, a commentator asked the perfectly sensible and obvious question:

Is there a reason this couldn’t be a bash script that invokes pdsh?

Ultimately the complexity of our driver program is caused by how the Ubuntu package update process is flawed. We might still have a Python program instead of a shell script if the process worked better, but it would at least be a simpler Python program.

There are a number of complicated things that our driver program does (and my list here is somewhat different than my list in my reply comment). The lesser one is that it parses the output of apt-get to determine what packages would be updated or nominally did get updated on machines during an update run. This parsing could theoretically be done in an awk script, but in Python we can take advantage of better data structures to make it clearer and gather more complex data. The obvious thing we do with this complex data is aggregate it by groups of machines that will all apply the same set of package updates; usually this drastically reduces the output down to something that's much easier to follow.

(One of the other things we do with this complex data is look for signs of mis-configurations in what Ubuntu packages are held, because sometimes either something goes wrong or a machine was not quite set up correctly. If we spot things like a Samba server package update that would be applied, we print a big warning. This has saved us from awkward problems several times. After the driver's initial scan has finished, we can exclude machines from updates, or we can bail out and hold the packages properly on the machine, then restart the whole process.)

After the initial scan for updates is done, the update driver enters a command loop where it asks what to do next. Typically we tell it to apply updates to everything, but you can also tell it to do a specific machine first, or exclude some machines from what will be updated, and a number of other things. Or you can quit out immediately if you don't actually want to apply updates (perhaps you were just checking what updates were pending). The command loop ends when the update driver thinks it has nothing left to do because all still-eligible machines have had updates applied; at this point the updates driver writes out its final summary and so on.

The most complicated portion of the program and the process is actually applying the updates on each system. When we were basically doing 'ssh host apt-get -y upgrade' in an earlier version of our update automation, we found that it would periodically stall on some host and then we would have a problem; sometimes apt-get wanted to ask us a question, and sometimes it just ran into issues. So our current approach is to run the updates in what 'ssh -t' and apt-get think is an interactive environment, capture all of their output without spewing it over our terminal, and then if things seem to go wrong allow us to step into the session to answer questions, sort things out, or just see where things stalled. Mechanically we use the third party Python pexpect module, which I had some learning experiences with (although I see that the module has been updated since then).

(The driver's current way of detecting problems is if an update produces no output for a sufficiently long time. We can also immediately step in if we want to.)

In theory apt-get and dpkg have settings that should let the update process automatically pick the default answer for any question a package update wants to ask us. In practice, we don't trust the default answer to always be sensible on package upgrades, although we do try to tell dpkg to always pick our own local version of configuration files to cut down on the questions we get asked.

Because Ubuntu package updates and apt-get operations are slow, we want to be able to run package updates in parallel, although we don't always do so. This adds extra complications to stepping into apt-get sessions, as you might expect, and there's a certain amount of code to coordinate all of this. Also, if one session has to be stepped into, we don't want to automatically continue on to do other (serial) updates, in case this is a systemic issue with this set of updates that we want to deal with before we proceed. Similarly if one update session fails outright (with ssh returning an error code), the driver pauses and waits for further directions.

(The entire reason the driver exists is so that we don't have to do updates one by one with manual attention. If a particular package update turns out to require manual attention, we will often either hold the package to block the update until we can figure things out, or directly update the affected machines by hand. If we have to interact with an 'apt-get upgrade', running it directly on the machine instead of through the driver is better.)

The updates driver also has a second mode that is used to update held packages. In this mode, we run 'apt-get install <...>' for the specific packages we want to update, instead of the usual 'apt-get upgrade', and the update driver's command loop now has commands for selecting what package or packages should be updated (we don't necessarily want to update all held packages on a machine). This is typically used for things like kernel updates, where we want to mass update all of our machines. Updates of per-machine held packages (like the Samba server) are often done by just logging in to the machine and doing the process by hand (we often want to monitor daemon logs and so on anyway).

(There are also some ancillary modes of operation, like a dry run mode and a mode to just report on what held packages have pending updates. Additional features let us control which machines it operates on, including trying to update machines that aren't in our normal list of machines to update.)

PS: Probably the updates driver has too many features. Certainly it has features that we don't really use, and some that I'd forgotten about until I re-read its full help text. It's one of those programs where my enthusiasm may have gotten away from me when I wrote it.

UpdatesDriverComplexity written at 02:30:09; Add Comment

2020-03-08

How we sort of automate updating system packages across our Ubuntu machines

Every place with more than a handful of Unix systems has to figure out a system for keeping them up to date, because doing it entirely by hand is too time consuming and error prone. We're no exception, so we've wound up steadily evolving our processes into a decently functional but somewhat complicated setup for doing this to our Ubuntu machines.

The first piece is a cron job that uses apt-show-versions and a state file to detect new updates for a machine and send email listing them off to us. In practice we don't actually read these email messages; instead, we use the presence of them in the morning as a sign that we should go do updates. This cron job is automatically set up on all of our machines by our standard Ubuntu install.

(Things are not quite to the point where Ubuntu has updates every day, and anyway it's useful to have a little reminder to push us to do updates.)

The second piece is that we have a central list of our current Ubuntu systems. To make sure that the list doesn't miss any active machines, our daily update check cron job also looks to see if the system it's running on is in the list; if it's not, it emails us a warning about that (in addition to any email it may send about the system having updates). The warning is important because this central list is used to determine what Ubuntu machines we'll try to apply updates on.

Finally, we have the setup for actually applying the updates on demand, which started out as a relatively simple Python program that automated some ssh commands and then grew much more complicated as we ran into issues and problems. Its basic operation is to ssh off to all of the machines on that central list of them, get a list of the pending updates through apt-get, then let you choose to go ahead with updating some or all of the machines (which is done with another round of ssh sessions that run apt-get). The output from all of the update sessions is captured and logged to a file, and at the end we get a compact summary of what groups of packages got updated on what groups of machines.

I call our system sort of automated because it's not completely hands off. Human action is required to run the central update program at all and then actively tell it to go ahead with whatever it's detected. If we're not around or if we forget, no updates get applied. However, we don't need to do anything on a per-machine basis, and unless something goes wrong the interaction we need to do with the program takes only a few seconds of time at the start.

(We strongly prefer not applying updates truly automatically; we like to supervise the process and make final decisions, just in case.)

Not all packages are updated through this system, at least routinely. A few need special manual procedures, and a number of core packages that could theoretically be updated automatically are normally 'held' (in dpkg and apt terminology) so they'll be skipped by normal package updates. We don't apply kernel updates until shortly before we're about to reboot the machine, for example, for various reasons.

Our central update driver is unfortunately a complicated program. Apt, dpkg, and the Debian package format don't make it easy to do a good job of automatically applying updates, especially in unusual situations, and so the update driver has grown more and more features and warts to try to deal with all of that. Sadly, this means that creating your own equivalent version isn't a simple or short job (and ours is quite specific to our environment).

UbuntuOurUpdateSystem written at 03:39:14; Add Comment

2020-03-07

Linux's iowait statistic and multi-CPU machines

Yesterday I wrote about how multi-CPU machines quietly complicate the standard definition of iowait, because you can have some but not all CPUs idle while you have processes waiting on IO. The system is not totally idle, which is what the normal Linux definition of iowait is about, but some CPUs are idle and implicitly waiting for IO to finish. Linux complicates its life because iowait is considered to be a per-CPU statistic, like user, nice, system, idle, irq, softirq, and the other per-CPU times reported in /proc/stat (see proc(5)).

As it turns out, this per-CPU iowait figure is genuine, in one sense; it is computed separately for each CPU and CPUs may report significantly different numbers for it. How modern versions of the Linux kernel keep track of iowait involves something between brute force and hand-waving. Each task (a process or thread) is associated with a CPU while it is running. When a task goes to sleep to wait for IO, it increases a count of how many tasks are waiting for IO 'on' that CPU, called nr_iowait. Then if nr_iowait is greater than zero and the CPU is idle, the idle time is charged to iowait for that CPU instead of to 'idle'.

(You can see this in the code in account_idle_time() in kernel/sched/cputime.c.)

The problem with this is that a task waiting on IO is not really attached to any particular CPU. When it wakes up, the kernel will try to run it on its 'current' CPU (ie the last CPU it ran on, the CPU who's run queue it's in), but if that CPU is busy and another CPU is free, the now-awake task will be scheduled on that CPU. There is nothing that particularly guarantees that tasks waiting for IO are evenly distributed across all CPUs, or are parked on idle CPUs; as far as I know, you might have five tasks all waiting for IO on one CPU that's also busy running a sixth task, while five other CPUs are all idle. In this situation, the Linux kernel will happily say that one CPU is 100% user and five CPUs are 100% idle and there's no iowait going on at all.

(As far as I can see, the per-CPU number of tasks waiting for IO is not reported at all. A global number of tasks in iowait is reported as procs_blocked in /proc/stat, but that doesn't tell you how they're distributed across your CPUs. Also, it's an instantaneous number instead of some sort of accounting of this over time.)

There's a nice big comment about this in kernel/sched/core.c (just above nr_iowait(), if you have to find it because the source has shifted). The comment summarizes the situation this way, emphasis mine:

This means, that when looking globally, the current IO-wait accounting on SMP is a lower bound, by reason of under accounting.

(It also says in somewhat more words that looking at the iowait for individual CPUs is nonsensical.)

Programs that report per-CPU iowait numbers on Linux are in some sense not incorrect; they're faithfully reporting what the kernel is telling them. The information they present is misleading, though, and in an ideal world their documentation would tell you that per-CPU iowait is not meaningful and should be ignored unless you know what you're doing.

PS: It's possible that /proc/pressure/io can provide useful information here, if you have a sufficiently modern kernel. Unfortunately the normal Ubuntu 18.04 server kernel is not sufficiently modern.

LinuxMultiCPUIowait written at 01:11:12; Add Comment

2020-02-25

The basics of /etc/mailcap on Ubuntu (and Debian)

One of the things that is an issue for any GUI desktop and for many general programs is keeping track of what program should be used to view or otherwise handle a particular sort of file, like JPEGs, PDFs, or .docx files. On Ubuntu and Debian systems, this is handled in part through the magic file /etc/mailcap, which contains a bunch of mappings from MIME types to what should handle them, with various trimmings. You can also have a personal version of this file in your home directory as ~/.mailcap.

In the old days when we didn't know any better, installing and removing programs probably edited /etc/mailcap directly. These days the file is automatically generated from various sources, including from individual snippet files that are stored in /usr/lib/mime/packages. Various programs drop files in this directory during package installation and then update-mime is magically run to rebuild /etc/mailcap. One should not confuse /usr/lib/mime/packages with /usr/share/mime/packages; the latter has XML files that are used by the separate XDG MIME application specification.

(As the update-mime manpage covers, it also uses the MimeType= information found in .desktop files in /usr/share/applications.)

As far as I know, the update-mime manpage is the sole good source for information about the format of these little snippets in /usr/lib/mime/packages and the eventual format of mailcap entries. The format is arcane, with many options and quite a lot of complex handling, and there is no central software package for querying the mailcap data; for historical reasons, everyone rolls their own, with things like the Python mailcap module. Update-mime and other parts of this come from the mime-support package.

(For fun, there are multiple generations of mailcap standards. We start with RFC 1524 from 1993, and then extend from there. On Ubuntu systems, the mailcap manpage doesn't document all of the directives that update-mime does, for example.)

A single MIME type may have multiple mailcap entries once all of the dust settles (plus the possibility of wildcard entries as well as specific ones, for example a 'text/*' entry as well as a 'text/x-tex' one). For example, on our Ubuntu login servers, there are no less than 7 /etc/mailcap entries for text/x-tex, and 13 for image/png and image/jpeg. In theory people using /etc/mailcap are supposed to narrow down these entries based on whether or not they can be used in your current environment (some only work in an X session, for example) and their listed priorities. In practice the mailcap parsing code you're using probably doesn't support the full range of complexity on a current Ubuntu or Debian system, partly because features have been added to the format over time, and it may simple pick either the first or the last mailcap entry that matches.

The Freedesktop aka XDG specifications have their own set of MIME association tracking standards, in the Shared MIME database specification and the MIME applications associations specification. These are used by, among other things, the xdg-utils collection of programs, which is how at least some GUI programs decide to handle files. I believe that these tools don't look at /etc/mailcap at all, but they do use MIME type information from .desktop files in /usr/share/applications and the XML files in /usr/share/mime/packages. They might even interpret it in the same way that update-mime does. The XDG tools and MIME associations all assume that you're using a GUI; they have no support for separate handling of a text mode environment.

Any particular GUI program might rely on the XDG tools, use mailcap, or perhaps both, trying XDG and then falling back on mailcap (parsed with either its own code or some library). A text mode program must use mailcap. I'm not sure how self-contained environments like Firefox and Thunderbird work, much less Chrome.

(See also the Arch Wiki page on default applications.)

UbuntuMailcapBasics written at 00:20:38; Add Comment

2020-02-21

An appreciation for Cinnamon's workspace flipping keyboard shortcuts

When I first started using Cinnamon (which was back here for serious use), I thought of it just as the closest good thing I could get to the old Gnome 2 environment (as Gnome 3 is not for me). Over time, I've come to appreciate Cinnamon for itself, and even propagate aspects of Cinnamon back into my desktop fvwm setup (such as keyboard control over window position and size). One of the little Cinnamon aspects I now appreciate is its slick and convenient keyboard handling of what I would call virtual screens and Cinnamon calls workspaces.

As basic table stakes, Cinnamon organizes workspaces sequentially and lets you move left and right through them with Ctrl + Alt + Right (or Left) Arrow. By default it has four workspaces, which is enough for most sensible people (I'm not sensible on my desktop). Where Cinnamon gets slick is that it has an additional set of keyboard shortcuts for moving to another workspace with the current window, Ctrl + Alt + Shift + Left (or Right). It turns out that it's extremely common for me to open a new window on my laptop's relatively constrained screen, then decide that things are now too cramped and busy in this workspace and I want to move. The Cinnamon keyboard shortcuts make that a rapid and fluid operation, and I can keep moving the window along to further workspaces by just hitting Right (or Left) again while still holding down the other keys.

(As I've experienced many times before, having this as an easy and rapid operation encourages me to use it; I shuffle windows around this way on my laptop much more than I do on my desktops, where moving windows between virtual screens is a more involved process that generally requires multiple steps.)

Every so often I've thought about trying to create a version of this keyboard shortcut in fvwm, but so far I haven't seen a good way to do it. Although fvwm has functions and supports some logic operations, the feature's actually got a variety of challenges in fvwm's model of windows and what the current window can be. I'm pretty sure that if I looked at the actual Cinnamon code for this, it would turn out to be much more complicated than you'd expect from such a simple-sounding thing.

(I already have a keyboard shortcut for just moving to a different virtual screen; the tricky bit in fvwm is taking the current window along with me when (and only when) it's appropriate to do so given the state of the current window. I suppose the easy way to implement this is to assume that if I hit the 'take the window with me' shortcut, I've already determined that what fvwm considers the current window should be moved to the target virtual screen and my fvwm function can just ignore all of the possible weird cases.)

CinnamonWorkspaceFlipLike written at 00:23:06; Add Comment

2020-02-17

The uncertainty of an elevated load average on our Linux IMAP server

We have an IMAP server, using Dovecot on Ubuntu 18.04 and with all of its mail storage on our NFS fileservers. Because of historical decisions (cf), we've periodically had real performance issues with it; these issues have been mitigated partly through various hacks and partly through migrating the IMAP server and our NFS fileservers from 1G Ethernet to 10G (our IMAP server routinely reads very large mailboxes, and the faster that happens the better). However, the whole experience has left me with a twitch about problem indicators for our IMAP server, especially now that we have a Prometheus metrics system that can feed me lots of graphs to worry about.

For a while after we fixed up most everything (and with our old OmniOS fileservers), the IMAP server was routinely running at a load average of under 1. Since then its routine workday load average has drifted upward, so that a load average of 2 is not unusual and it's routine for it to be over 1. However, there are no obvious problems the way there used to be; 'top' doesn't show constantly busy IMAP processes, for example, indicators such as the percentage of time the system spends in iowait (which on Linux includes waiting for NFS IO) is consistently low, and our IMAP stats monitoring doesn't show any clear slow commands the way it used to. To the extent that I have IMAP performance monitoring, it only shows slow performance for looking at our test account's INBOX, not really other mailboxes.

(All user INBOXes are in our NFS /var/mail filesystem and some of them are very large, so it's a really hot spot and is kind of expected to be slower than other filesystems; there's only really so much we can do about it. Unfortunately we don't currently have Prometheus metrics from our NFS fileservers, so I can't easily tell if there's some obvious performance hotspot on that fileserver.)

All of this leaves me with two closely related mysteries. First, does this elevated load average actually matter? This might be the sign of some real IMAP performance problem that we should be trying to deal with, or it could be essentially harmless. Second, what is causing the load average to be high? Maybe we frequently have blocked processes that are waiting on IO or something else, or that are running in micro-bursts of CPU usage.

(eBPF based tracing might be able to tell us something about all of this, but eBPF tools are not really usable on Ubuntu 18.04 out of the box.)

Probably I should invest in developing some more IMAP performance measurements and also consider doing some measurements of the underlying NFS client disk IO, at least for simple operations like reading a file from a filesystem. We might not wind up with any more useful information than we already have, but at least I'd feel like I was doing something.

LoadAverageIMAPImpactQuestion written at 22:22:22; Add Comment

The case of mysterious load average spikes on our Linux login server

We have a Linux login server that is our primary server basically by default; it's the first one in numbering and the server a convenient alias is pointed to, so most people wind up using it. Naturally we monitor its OS level metrics as part of our Prometheus setup, and as part of that a graph of its load average (along with all our other interesting servers) appears on our overview Grafana dashboard. For basically as long as we've been doing this, we've noticed that this server experiences periodic and fairly drastic short term load average spikes for no clear reason.

A typical spike will take the 1-minute load average from 0.26 or so (the typical load average for it) up to 6.5 or 7 in a matter of seconds, and then immediately start dropping back down. There seems to often be some correlation with other metrics, such as user and system CPU time usage, but not necessarily a high one. We capture ps and top output periodically for reasons beyond the scope of this entry, and these captures have never shown anything in particular even when they capture the high load average itself. The spikes happen at all times, day or night and weekday or weekend, and don't seem to come in any regular pattern (such as every five minutes).

The obvious theory for what is going on is that there are a bunch of processes that have some sort of periodic wakeup where they do a very brief amount of work, and they've wound up more or less in sync with each other. When the periodic wakeup triggers, a whole bunch of processes become ready to run and so spike the load average up, but once they do run they don't do very much so the log-jam clears almost immediately (and the load average immediately drops). Since it seems to be correlated with the number of logins, this may be something in systemd's per-login process infrastructure. Since all of these logins happen over SSH, it could also partly be because we've set a ClientAliveInterval in our sshd_config so sshd likely wakes up periodically for some connections; however, I'm not clear how that would wind up in sync for a significant number of people.

I don't know how we'd go about tracking down the source of this without a lot of work, and I'm not sure there's any point in doing that work. The load spikes don't seem to be doing any harm, and I suspect there's nothing we could really do about the causes even if we identified them. I rather expect that having a lot of logins on a single Linux machine is now not a case that people care about very much.

LoadAverageMultiuserSpikes written at 01:19:38; Add Comment

2020-02-09

I'm likely giving up on trying to read Fedora package update information

Perhaps unlike most people, I apply updates to my Fedora machines through the command line, first with yum and now with dnf. As part of that, I have for a long time made a habit of trying to read the information that Fedora theoretically publishes about every package update with 'dnf updateinfo info', just in case there was a surprise lurking in there for some particular package (this has sometimes exposed issues, such as when I discovered that Fedora maintains separate package databases for each user). Sadly, I'm sort of in the process of giving up on doing that.

The overall cause is that it's clear that Fedora does not really care about this update information being accurate, usable, and accessible. This relative indifference has led to a number of specific issues with both the average contents of update information and to the process of reading it that make the whole experience both annoying and not very useful. In practice, running 'dnf updateinfo info' may not tell me about some of the actual updates that are pending, always dumps out information about updates that aren't pending for me (sometimes covering ones that have already been applied, for example for some kernel updates), and part of the time the update information itself isn't very useful and has 'fill this in' notes and so on. The result is verbose but lacking in useful information and frustrating to pick through.

The result is that 'dnf updateinfo info' has been getting less and less readable and less useful for some time. These days I skim it at best, instead of trying to read it thoroughly, and anyway there isn't much that I can do if I see something that makes me wonder. I can get most of the value from just looking at the package list in 'dnf check-update', and if I really care about update information for a specific package I see there I'm probably better off doing 'dnf updateinfo info <package>'. But still, it's a hard to let go of this; part of me feels that reading update information is part of being a responsible sysadmin (for my own personal machines).

Some of these issues are long standing ones. It's pretty clear that the updateinfo (sub)command is not a high priority in DNF as far as bug fixes and improvements go, for example. I also suspect that some of the extra packages I see listed in 'dnf updateinfo info' are due to DNF modularity (also), and I'm seeing updateinfo for (potential) updates from modules that either I don't have enabled or that 'dnf update' and friends are silently choosing to not use for whatever reasons. Alternately they are base updates that are overridden by DNF modules I have enabled; it's not clear.

(Now that I look at 'dnf module list --enabled', it seems that I have several modules enabled that are relevant to packages that updateinfo always natters about. One update that updateinfo talks about is for a different stream (libgit2 0.28, while I have the libgit2 0.27 module enabled), but others appear to be for versions that I should be updating to if things were working properly. Unfortunately I don't know how to coax DNF to show me what module streams installed packages come from, or what it's ignoring in the main Fedora updates repo because it's preferring a module version instead.)

FedoraNotReadingUpdateinfo written at 23:24:37; Add Comment

2020-01-25

A network interface losing and regaining signal can have additional effects (in Linux)

My office at work features a dearth of electrical sockets and as a result a profusion of power bars and other means of powering a whole bunch of things from one socket. The other day I needed to reorganize some of the mess, and as part of that I wound up briefly unplugging the power supply for my 8-port Ethernet switch that my office workstation is plugged into. Naturally this meant that the network interface lost signal for a bit (twice, because I wound up shuffling the power connection twice). Nothing on my desktop really noticed, including all of the remote X stuff I do, so I didn't think more about it. However, when I got home, parts of my Wireguard tunnel didn't work. I eventually fixed the problem by restarting the work end of my Wireguard setup, which does a number of things that including turning on IP(v4) forwarding on my workstation's main network interface.

I already knew that deleting and then recreating an interface entirely can have various additional effects (as happens periodically when my PPPoE DSL connection goes away and comes back). However this is a useful reminder to me that simply unplugging a machine from the network and then plugging it in can have some effects too. Unfortunately I'm not sure what the complete list of effects is, which is somewhat of a problem. Clearly it includes resetting IP forwarding, but there may be other things.

(All of this also depends on your system's networking setup. For instance, NetworkManager will deconfigure an interface that goes down, while I believe that without it, the interface's IP address remains set and so on.)

I'm not sure if there's any good way to fix this so that these settings are automatically re-applied when an interface comes up again. Based on this Stackexchange question and answer, the kernel doesn't emit a udev event on a change in network link status (it does emit a netlink event, which is probably how NetworkManager notices these things). Nor is there any sign in the networkd documentation that it supports doing something on link status changes.

(Possibly I need to set 'IgnoreCarrierLoss=true' in my networkd settings for this interface.)

My unfortunate conclusion here is that if you have a complex networking setup and you lose link carrier on one interface, the simplest way to restore everything may be to reboot the machine. If this is not a good option, you probably should experiment in advance to figure out what you need to do and perhaps how to automate it.

(Another option is to work out what things are cleared or changed in your environment when a network interface loses carrier and then avoid using them. If I turned on IP forwarding globally and then relied on a firewall to block undesired forwarding, my life would probably be simpler.)

InterfaceCarrierLossHasEffects written at 00:24:59; Add Comment

(Previous 10 or go back to January 2020 at 2020/01/15)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.