Wandering Thoughts

2020-10-25

Remotely upgrading my office workstation to Fedora 32 worked fine

One of the things I've been worried about during the current long length of working from home has been upgrading my office workstation to Fedora 32. Upgrading my home machine is straightforward (at one level) because I'm here right in front of it and I can probably sort out anything that goes wrong (although that's complicated by my home machine being my Internet gateway). I'm not in front of my office workstation, so either I was going to have to make a trip in to the office or I would have to take the scary approach of a completely remote upgrade and reboot.

Well, this Friday I did the remote upgrade and it worked smoothly. As usual, I did it through a live upgrade with DNF, which let me monitor all of the package updates to watch for alarming messages (there weren't any), check that DKMS had properly rebuilt my ZFS on Linux kernel modules before reboot, and so on. The post upgrade reboot was smooth, judging from the fact that my office machine was back on the air in about 45 seconds.

At one level this is what I had been expecting. My office machine rebooted fine on its own after a power glitch (cf), and then I got daring enough to do a remote kernel upgrade. I also got an encouraging success report on Twitter. At another level I didn't quite believe that it would work smoothly until it did, and I still feel happy and glad that it did.

(I'd like to say that it's been a fairly long time since a Fedora upgrade went wrong on me, but in fact it was only last year, followed by another problem on a kernel upgrade. So in one sense I really did sort of get lucky, although I likely would have found that sort of problem on my home machine, which I upgraded first.)

Having it really work is reassuring, because it seems very likely that we'll be working from home for enough longer that I'll get to do it again. Well, beyond the little fact that Fedora 33 comes out in a few days.

(I never immediately upgrade to newly released Fedora versions; other people can find the problems for me. But I'm now much more likely to upgrade to 33 in early December or something.)

Fedora32RemoteUpgrade written at 00:54:06; Add Comment

2020-10-21

Keeping VMware Workstation VMs running when I quit from VMware

I recently tweeted:

VMware over 'ssh -X' is surprisingly chatty even when iconified, to the tune of 200 KB/sec inbound. Maybe I need to investigate some remote desktop thing I can detach from and reattach to.

(I'm now keeping some VMs running full time.)

(The primary reason I'm keeping a VM up all the time right now is that it's our only Ubuntu 20.04 test machine and we need experience with that. As far as running an X GUI program like VMware over a DSL link goes, it works generally okay, although it's not great.)

To be specific, this is VMware Workstation Pro, which I still use because it's the best option for my specific needs (especially without rebuilding my office workstation's network configuration). I regularly need or at least want the full VMware GUI and the power it gives me, but if I keep it running (even without a running VM and iconified), it consistently uses up about 200 KB/sec of incoming ('down') bandwidth on my DSL link.

(Using nethogs confirmed that the process responsible for the bandwidth was my 'ssh -X'.)

If you quit from the VMware Workstation GUI with running VMs, it will give you the option to keep them running in the 'background'. When I first tried this, they would abruptly be stopped later, so I thought my only option was to keep the GUI running. Just recently I discovered that I could keep them running if I quit out of the VMware GUI but didn't actually log out of my 'ssh -X' session. Having to keep my session logged in this way is a bit irritating (and has some limits) but at least it stops the bandwidth usage.

(Nohup'ing the vmware command when I ran it from the 'ssh -X' session kept the VM running after I logged out, but then abruptly shut it down when I logged back in and ran vmware again. This is not a desirable property.)

But that's only half of the potential advantages of some sort of remote desktop thing. The other half is that it might let me deal with VMware rendering the consoles (or displays) of virtual machines at one pixel for one pixel resolution even though I'm using a HiDPI display. This is a reasonable decision for VMware to make, but it does have the effect of making both Linux text consoles and any GUI stuff rather small. I can see things and work with them if I have to, but it would be more comfortable if they were bigger.

VMware supports connecting to VM consoles with VNC, as I found out on Twitter, and some Linux clients support scaling up VNC sessions; specifically I've seen this work with Remmina. But it would be nicer if I ran all of the VMware GUI inside some remote desktop session where I could just scale all of it up on my side. Scaling up on my local desktop would also use a lot less bandwidth than running 'ssh -X' and having the VMware GUI scale up everything on the remote machine and push it over the link.

(I'm not sure I like the remote VMware console experience with Remmina, but my impressions of it right now are strongly coloured by how my freshly installed Fedora 32 test VM doesn't seem to want to usefully run Cinnamon. When I log in, either in the VNC client or in the VMware GUI, all I get is a black screen. Cinnamon claims to be running without problems but nothing is being displayed.)

PS: Although it's officially 'VMware', that has always looked wrong to me and I reflexively capitalize it as 'VMWare'. Sometimes I correct that in Wandering Thoughts entries, and sometimes not.

VMwareKeepRunning written at 23:52:11; Add Comment

A mystery uncovered by Fedora 32 changing my default font

I upgraded my home machine to Fedora 32 earlier this week. Everything went smoothly, except that afterward various web pages looked just a little bit different in my Firefox. After poking around and finding a handy Firefox font related thing, I discovered that Fedora 32 had changed what the standard system 'serif', 'sans-serif' and 'monospace' font names mapped to from DejaVu (in Fedora 31) to Bitstream Vera (in Fedora 32). The two fonts are very close (DejaVu is a modification of Bitstream Vera), but apparently just far enough apart that I can tell the difference.

In theory this could all be controlled through Fontconfig. In practice my simple brute force fix was to remove the Bitstream Vera fonts on the grounds that I didn't need them and they aren't there on a standard Fedora 32 install. But afterward I started poking around the Fontconfig files in /etc/fonts/conf.d to see if I could spot why the change happened, and wound up more confused than before.

My office workstation is still on Fedora 31 and it is definitely using the DejaVu fonts (you can tell with 'fc-match serif' and so on). However, it has Bitstream Vera fonts installed, and both my Fedora 31 and 32 machines have an /etc/fonts/conf.d/60-latin.conf file that appears to specify that the Bitstream Vera fonts should be preferred over the DejaVu versions:

<description>Set preferable fonts for Latin</description>
  <alias>
    <family>serif</family>
      <prefer>
        <family>Bitstream Vera Serif</family>
        <family>DejaVu Serif</family>
        <family>Times New Roman</family>
        <family>Thorndale AMT</family>
[...]

Setting the magic $FC_DEBUG environment variable on my Fedora 31 machine (documented in fonts.conf) did not provide any enlightenment; it may provide debugging, but it doesn't make fc-match explain its decisions. Fedora 31 and 32 have essentially the same version of Fontconfig (2.13.92-3.fc31 and 2.13.92-9.fc32 respectively), so that isn't a smoking gun either.

So, as has happened before, I started with a surprising change and have wound up with a mystery of why my Fedora 31 machines worked (and work) the way they did. As far as I can tell, the system fonts should have mapped to Bitstream Vera even on Fedora 31.

(I'm also left with the puzzle about why Fedora even seems to prefer Bitstream Vera over DejaVu, given that DejaVu was created to improve on Bitstream Vera and long ago Fedora switched to them. Bitstream Vera isn't even installed on normal from-scratch Fedora installs and hasn't been for some time, which may be part of why this bit of configuration has lingered on.)

Fedora32DefaultFontChange written at 00:50:50; Add Comment

2020-10-18

We need to start getting some experience with using Ubuntu 20.04

Under normal circumstances, we would have a decent number of machines running Ubuntu 20.04 by now, probably including our login servers. But the situation is not normal, because ongoing world and local events still have us working from home, making it not so simple to install and deploy a new physical server with a new version of Ubuntu. However, it really looks like this is the new normal so we should start dealing with it.

It may or may not make sense to spend limited in-office time upgrading perfectly good 18.04 machines to 20.04 (I speculated about this back here in early August), although I suspect we're going to wind up doing some of it. I think it does make sense to install completely new machines on Ubuntu 20.04 for more lifetime, and we're certainly going to have some of those. We have what I believe is a working 20.04 install system, but what we don't currently have are any continuously running 20.04 machines, especially ones that normal people can use, explore, and see what's broken or odd. In the past actually operating new versions of Ubuntu has frequently turned up surprises, so the sooner we start doing that the better.

The obvious thing to do is to build a few 20.04 test servers. We're likely going to run Apache on some 20.04 machines, so one test server should have an Apache install. Another one should be a general login server, which would let us look into how various programs that people use behave on 20.04. We should also build a third server that's completely expendable and we can experiment with rebooting and other things that may blow up. All of these have to be built on physical hardware, since we don't currently have any virtualization environment (and anyway we'd be running most 20.04 machines on physical hardware).

(Running on actual physical hardware has periodically turned up practical problems. Since it's now eight years after that issue, perhaps we should experiment with no longer supplying that kernel parameter in 20.04.)

PS: An expendable test server is where it would be very nice to have some way to roll back the root filesystem to an initial state. This can apparently be done through LVM, which Ubuntu does support for the root filesystem, and I may experiment with it following eg the Arch wiki.

(This is one of the entries that I write partly to motivate myself to start working on something. We've been mostly ignoring Ubuntu 20.04 so far and it would be very easy to keep on doing so.)

Ubuntu2004GettingExperience written at 00:37:57; Add Comment

2020-10-06

Linux distributions have sensible reasons to prefer periodic releases

In an aside in my entry on why Fedora version upgrades are a pain for me, I said that there are sensible reasons for for distributions doing periodic releases instead of 'rolling releases'. Today I want to talk about some of them.

To start with, pretty much every major Linux distribution practices some kind of rolling development (I don't count Red Hat Enterprise here). No one does a release then goes away to do other things for a year or three before they start putting together the next one; instead, everyone has some kind of rolling, in development collection of packages that are updated on an ongoing basis. As upstream updates come out they generally get packaged and pushed into this rolling collection, and after a while some version of the collection starts to get turned into the next release. However, not all upstream releases and package changes are equal. Some are more disruptive and fundamental than others, either because they're significant changes to fundamental building blocks (such as the C compiler or standard system libraries) or because they're big changes to user visible things (a new Gnome or KDE release). There are also significant changes to the composition of the distribution, such as 'we're switching from Upstart to systemd' or 'we're mostly removing Python 2'.

Doing these big transitions only periodically, in new releases, provides Linux distributions some advantages. To start with, you have fewer system states to worry about and to debug; either all your transitions have been applied (in the form of a release upgrade) or none of them have. You don't have to build tooling and do testing for a whole series of successive transitions through various system states ('we have systemd but with Python 2 still around'), and can go for a simpler shift ('systemd and it only has to support Python 3 bindings'). Some of the time you can build specific tooling that runs outside of your normal package updates in order to make these transitions (if you only officially support release to release upgrades through special tools, instead of through package updates).

(I suspect it's also easier to document the changes for people upgrading, and easier for people upgrading to read through and stay on top of.)

A second advantage is that releases provide a place to say that all work on something must be completed and has been. For example, all packages have been rebuilt with your new C compiler, so that you know everything works, and everything is linked to only the latest libraries you provide. You probably don't want to rebuild all packages with the new C compiler the moment it lands in your rolling development version, but you do want all packages to get rebuilt eventually. A release provides a natural 'eventually' point in a way that rolling updates don't have.

Since a release in preparation becomes more and more frozen over time, it's easier to test and then to be sure that problems have been fixed. Constant changes don't necessarily lead to constant bugs, but they do lead to constantly out of date testing results. This is especially relevant for testing the distribution as a whole, such as installing it from scratch, and any testing that has to be done by hand, since people aren't going to do that continually (unless you pay them, and it's expensive).

Similarly, releases provide a natural point to make (and force) decisions about whether something is stable enough or ready enough to be shipped to normal people, while also allowing enough time for problems to be fixed and things to be stabilized before you have to make that decision. This probably simplifies the overall package ecology, because you don't get as many situations where version X+1 is in the 'test' rolling version and other packages are starting to depend on it, but it can't be pushed to the stable rolling version and so it's blocking other package updates that are themselves stable. If you decide that version X+1 is not ready to be shipped as part of the next release, it's pulled from the in-preparation package set and all other packages have to (re-)build against the old version. And your package management system will automatically tell you about all of them through dependencies.

On a political level, this means that your distribution doesn't have to deal with constant arguments over whether X or Y is ready. All of those arguments can be deferred for much of the release cycle, because they're only really important when the package set is stabilizing before release. Similarly, you're not completely committed to significant transitions until the release is freezing; before then, you can at least theoretically change your mind.

(In practice you've probably already landed a lot of preparatory packaging, tooling, and documentation work, and you may have abandoned old packages that you didn't expect to ship in the new release. It's not like there's a magic switch to throw that would restore full Python 2 support just like that in, say, Ubuntu 20.04.)

DistributionsWhyReleases written at 00:01:18; Add Comment

2020-09-13

Rolling distribution releases versus periodic releases are a tradeoff

In reaction to my entry on the work involved for me in upgrading Fedora, Ben Cotton wrote a useful entry, What do “rolling release” and “stable” mean in the context of operating systems?. In the entry, Ben Cotton sort of mentioned something in passing that I want to emphasize, which is that the choice between a rolling release and a periodic release is tradeoff, not an option where there is a clear right answer.

In the Linux world, fundamentally things change because the upstreams of our software change stuff around. Firefox drops support for old style XUL based addons (to people's pain); Gnome moves from Gnome 2 to Gnome 3 (an interface change that I objected to very strongly); Upstart loses out to systemd; Python 2 stops being supported; and so on. As people using a distribution, we cannot avoid these changes for long, and attempting to do so gives you 'zombie' distributions. So the question is when we get these changes inflicted on us and how large they are.

In a rolling release distribution, you get unpredictably spaced changes of unpredictable size but generally not a lot of change at once. Your experience is likely going to be a relatively constant small drumbeat of changes, with periodic bigger ones. Partly this is because large projects don't all change things at the same time (or even do releases at the same time), and partly this is because the distribution itself is not going to want to try to shove too many big changes in at once even if several upstreams all do big releases in close succession.

In a periodic release distribution, you get large blocks of change at predictable points (when a new release is made and you upgrade), but not a lot of change at other times. When you upgrade you may need to do a lot of adjustment at once, but other than that you can sit back. In addition, if something changes in your environment it may be hard to figure out what piece of software caused the change and what you can do to fix it, because so many things changed at the same time.

(In a rolling release distribution, you can often attribute a change in your environment to a specific update of only a few things that you just did.)

Neither of these choices of when and how to absorb changes is 'right'; they are a tradeoff. Some people will prefer one side of the tradeoff, and other people will prefer the other. Neither is wrong (or right), because it is a preference, and people can even change their views of what they want over time or in different circumstances.

(Although you might think that I come down firmly on the side of rolling releases for my desktops, I'm actually not sure that I would in practice. I may put off Fedora releases a lot because of how much I have to do at once, but at the same time I would probably get very irritated if I was frequently having to fiddle with some aspect of my custom non-desktop. It's a nice thing that I got everything working at the start of Fedora 31 and haven't had to touch it since.)

RollingVsReleasesNoWinner written at 00:16:33; Add Comment

2020-09-12

Some notes on what Fedora's DNF logs and where

In comments on my entry on why Fedora release upgrades are complicated and painful for me, Ben Cotton wound up asking me to describe my desired experience for DNF's output during a release upgrade. This caused me to go out and look at what DNF actually logs today (as opposed to its console output), so here are some notes. The disclaimers are that this is on my Fedora systems, which I think are reasonably stock but may not be, and that this isn't documented that I could find in a quick skim of DNF manpages, so I'm probably wrong about parts.

DNF logs to /var/log in three separate files, dnf.log, dnf.rpm.log, and dnf.librepo.log. Of these, dnf.librepo.log appears to be the least interesting, as all my version has is information about what metadata and packages have been downloaded and some debugging information if checksums don't match.

The dnf.log file contains copies of the same information about what package updates will be done and were done as dnf itself prints interactively. It also contains a bunch of debug information about DNF's cache manipulation, usage of deltarpm, the dates of repository metadata, and other things of less interest (at least to me). It looks like it's possible to reconstruct most or all of your DNF command lines from the information here, which could be useful under some circumstances.

Finally, dnf.rpm.log has the really interesting stuff. This seems to be a verbose log of the RPM level activity that DNF does (or did during a upgrade or package install). This includes the actual packages upgraded and removed, verbose information about .rpmnew and .rpmsave files being created and manipulated (which is normally printed by RPM itself), and what seems to be a copy of most output from RPM package scripts, including output that doesn't seem to normally get printed to the terminal by DNF. This is a gold mine if you want to go back through an upgrade to look for RPM package messages that you didn't spot at the time, although you'll have to pick through a lot of debugging output.

(I initially thought that dnf.rpm.log contained all output, but at lease during Fedora release upgrades it appears to miss some things that are printed to the terminal based on my notes and script captures.)

When DNF (or perhaps RPM via DNF) reports upgrades interactively, these go in two stages; the new version of the package is upgraded (ie installed), which will run its postinstall script, and then later there is a cleanup of the old version of the package, which will run its postrm script (if any). dnf.rpm.log doesn't appear to use the same terminology when it logs the second phase. The upgrade phase appears as 'SUBDEBUG Upgrade: ...', but the cleanup phase seems to be reported as 'SUBDEBUG Upgraded: ...'. If you remove something, for example because an old kernel is being removed when you install a new one, it's reported as 'SUBDEBUG Erase:'. When a new package is installed (including a new kernel), it is reported as 'SUBDEBUG Installed:', instead of the 'Install:' that you might expect for symmetry with upgrades.

(I don't know how downgrades or obsoletes are reported; I haven't dug through my DNF logs that much.)

Unlike interactive DNF, dnf.rpm.log doesn't record the mere fact that scriptlets have been run. If they're run and don't produce any output that the log captures, they're invisible. This is probably not a problem for logging purposes; interactively, it's mostly useful as a hint to why DNF seems to be sitting around not doing anything.

None of these logs are a complete replacement for capturing a DNF session with script, as far as I can tell (although some of my information here is effectively from the Fedora 30 version of DNF, not the Fedora 31 or 32 ones). However they're at least a useful supplement, and skimming them is faster than using 'less -r ...' on a script capture of a DNF session.

DNFLogsWhatWhere written at 00:15:33; Add Comment

2020-09-07

Why Fedora version upgrades are complicated and painful for me

It's September and I still haven't upgraded any of my machines to Fedora 32 (which came out at the end of April). If I delay too much longer, I might run into Fedora 33 coming out and Fedora 31 dropping out of my upgrade path, so I really need to start getting moving on this. But, much like why updating my Fedora kernels is complicated, my Fedora version updates are a drag; complex, time consuming, and periodically painful. So I keep not getting around to it.

(In a normal year, I would have spent a slow afternoon at work to upgrade the work machine, in an environment where having it not work is not completely disruptive, then upgraded the home machine. That's not today's environment; now I'm at home, and my home desktop is also my DSL gateway.)

Normal people have sensible straightforward Fedora upgrade processes. They start the upgrade in one of the official methods, go away for a an hour or three, and it all works. Because my machines run such an unusual and custom set of environments, I don't trust this process and I also don't want to be without either my home desktop or my work desktop for several hours. So the first complication of my upgrades is that I do live upgrades using dnf and during them, I watch dnf's output to see if there are signs of problems with package updates. I can do other things during this, but that's more than an hour where I am basically babysitting the machine while distracting myself every so often. This is a time sink and not a terribly pleasant way to spend my time, but it's probably the least of the upgrade's pain. Doing upgrades in an unofficial way on an unusual system configuration also raises the risks that something will break during them, and I can never completely test this in advance (for example).

(I capture all the dnf output by using script so that I can also look at it later, but there's no good way that I know of to scan through the result the way I could with a more straightforward log file. Something like less will show me the raw output, complete with progress bars being rendered and so on. And my terminal windows only have so much backscroll.)

The next big complication is that I use ZFS on Linux for my home directory and other critical things, and it's of course not integrated with Fedora. This means there's always a risk of something going wrong with my ZFS setup during a Fedora version upgrade. To deal with this, I 'preflight' my Fedora version upgrades extensively on virtual machines (which helps deal with the 'will they work in general' issue). This takes its own set of time and preparation work, and is its own kind of little slog.

Finally, upgrading Fedora sometimes creates problems in my custom desktop environment (or non-environment) that I'll have to then sort out (for example). These range from somewhat modest, such as font rendering issues, to significant, such as sound not working any more. In extreme cases, my desktop environment won't start at all and I get to spend some time sorting things out. This means that I can only start the upgrade on a day when I feel that I have that kind of time left in the day, and I have to be up to dealing with various kinds of irritation about my environment exploding.

There really isn't anything that can be done about all of this, and it's really all a pain that I've set myself up for through my own machine setup choices. So some time, I just have to say that I'm spending this afternoon (or this day) on the work, and get it done (and I'm hoping that writing this entry will help push me forward on it).

(Sometimes I wonder if tracking Fedora Rawhide would make my life easier by spreading this time and effort out over a longer time, instead of concentrating it all in a few days. But Rawhide's potential for serious bugs discourages me. What I really want is a rolling release of 'stable' Fedora, with no big bangs of major releases, but this will probably never exist. There are sensible reasons for distributions to like the idea of major releases, but that's for another entry.)

FedoraUpgradeDrag written at 23:50:37; Add Comment

2020-09-05

Some notes on what the CyberPower UPS 'Powerpanel' software reports to you

For reasons beyond the scope of this entry, I recently bought a reasonably nice UPS for home usage. Me being me, I then found a Prometheus metrics exporter for it, cyberpower_exporter (and see also Mike Shoup's blog post about it), and then tinkered with it. This exporter works by talking to the daemon provided by CyberPower's Powerpanel software, instead of talking directly to the UPS, so my first port of call was to dump the raw information the daemon was providing for my UPS.

(The Powerpanel software is available as a Fedora RPM that's not too obnoxious. Per the Arch Wiki page on CyberPower UPS, you can also use Network UPS Tools (NUT). I opted to take the simpler path that theoretically should just work.)

You get status information from Powerpanel by connecting to the Unix socket /var/pwrstatd.ipc (yes I know, it should be in /run) and sending ASCII 'STATUS' followed by two newlines. You can do this by hand with nc if you feel like it:

printf 'STATUS\n\n' | nc -U /var/pwrstatd.ipc

What you get back is something like this (this is my particular UPS model, yours may vary):

STATUS
state=0
model_name=CP1500PFCLCD
firmware_num=000000000000
battery_volt=24000
input_rating_volt=120000
output_rating_watt=900000
avr_supported=yes
online_type=no
diagnostic_result=1
diagnostic_date=2020/07/31 12:34:53
power_event_result=1
power_event_date=2020/07/31 12:33:59
power_event_during=21 sec.
battery_remainingtime=5160
battery_charging=no
battery_discharging=no
ac_present=yes
boost=no
utility_volt=121000
output_volt=121000
load=8000
battery_capacity=100

The 'volt' and 'watt' numbers need to be divided by 1000 to get the units you expect from their name. The 'load' is divided by 1000 to get a percentage (or by 100000 to get it in 0.0 to 1.0 form), and is as a percentage of the output rating watts. The daemon doesn't report the current load in watts; instead you have to compute it for yourself. The battery remaining time is in seconds. The battery capacity is a percentage, but unlike load, it's expressed as a straight 0-100 number. The times are in your local timezone, not UTC, and I don't know how the UPS reports longer durations of power events (in the minutes or even more than an hour).

I suspect that the state, power_event_result, and diagnostic_result fields can take on multiple values. Based on what the CyberPower pwrstat command reports for my UPS right now, these mean a normal state, that the last power event was a blackout (a total power loss), and that the last self-test passed.

(The blackout was because I unplugged the UPS from the wall socket to make sure everything worked, which is why it was so short.)

The reported load number is somewhat untrustworthy and definitely seems to be quantized by the UPS. It's possible to observe reported loads of '0' if my home machine environment is idle enough (with the display blanked). This isn't just an artifact of the Powerpanel software, either; when I looked at the UPS's actual front panel, it reported 0 load and 0 watts being used. The front panel also reports 'VA' figures, and they didn't go to zero at these '0 load' times. However, as far as I can tell VA figures aren't reported by the Powerpanel software, and may or may not be provided to the outside world by the UPS itself.

(The NUT page for a very similar model doesn't list any VA data.)

As a consequence, you can't really use the reported load value to see how much power your overall UPS-based setup is using over time; the UPS load will under-report at times of low usage and perhaps at other times. This was a bit disappointing, but then I didn't buy the UPS to (also) be a watt-meter with a USB readout that I could grab from the computer.

(The UPS connects to my desktop via USB and is visible as a USB device, but I haven't tried to dump its USB traffic to see the truly raw data. That's a little bit too much work for my current level of curiosity.)

CyberPowerPowerpanelNotes written at 01:09:54; Add Comment

2020-08-18

The Prometheus host agent can disturb Linux CPU frequency measurements

Recently I read CPU frequency scaling metrics from the node exporter, which talks about how to look at the Prometheus metrics that the Prometheus host agent gathers and exposes to Prometheus. Naturally this got me to to look at the frequencies that my own little Prometheus setup on my home machine had gathered, which gave me a surprise.

Like a lot of desktops, my home machine is idle almost all of the time, and I can see that reflected in a lot of the statistics that the Prometheus host agent gathers. But Prometheus reported that my my CPU frequency was hovering up at very high values, often around 4 GHz (and checking confirmed that these were what the host agent was reporting). Since this didn't match my expectations, I looked at the direct information in /sys:

: hawklords.cs ; cd /sys/devices/system/cpu/cpufreq
: hawklords.cs ; cat policy?/scaling_cur_freq policy??/scaling_cur_freq
800079
800210
800106
800045
800091
800032
800162
800644
801214
800175
800060
800026

That is, my CPUs are sitting at around 800 Mhz, which is actually the minimum frequency (scaling_min_freq is 800000). That's what I see almost all of the time when my desktop is idle, with brief exceptions.

My only theory for what's going on with the Prometheus host agent is that this is happening because the host agent is a Go program and is quite parallelized and concurrent. When Prometheus or you ask the host agent for metrics, it immediately goes out to gather them from all of its collectors in parallel, which is likely to make many or all of your CPUs busy and thus push up their frequencies. Apparently my overall system (Linux, the CPU, and whatever BIOS magic is going on) is so good at this that the speed rises fast enough for the host agent to observe it, and then drops again almost immediately once the host agent is done. I suspect that the Prometheus daemon itself also contributes to the CPU usage (since it's receiving the data from the host agent), but I expect that the host agent's multi-CPU usage is the big factor.

(The choice of CPU frequency governor likely affects this; my home machine is currently on 'powersave', which is what my Fedora 31 environment defaults to. The CPU frequency driver is intel_pstate.)

This unfortunately rather reduces the usefulness of the host agent's CPU frequency information on Linux. You can probably use it to look at big exceptions (such as CPUs, cores, or sockets that are persistently out of step with what they should be), but it's clearly not a reliable guide to the normal state of your systems.

PS: I see similar but less drastic effects on my office machine, which has an AMD Ryzen instead of an Intel CPU. Direct examination in /sys suggests that it idles around 1.8 Ghz, but the host agent sees it around 2.7 to 2.9 Ghz when idle, with spikes to higher.

PPS: The host agent does sometimes observe low frequencies; it's reported 800 Mhz frequencies on each core on my home machine at some point over the past week. It even appears to have seen 800 Mhz on all cores at some point.

PrometheusVsCPUFrequency written at 23:26:44; Add Comment

(Previous 10 or go back to August 2020 at 2020/08/10)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.