My new Linux home machine for 2018
Back in the fall I planned out a new home machine and then various things happened, especially Meltdown and Spectre (which I feel made it not a great time to get a new CPU) but also my natural inertia. This inertia sort of persisted despite a near miss scare, but in the end I wound up losing confidence in my current (now old) home machine and just wanting to get things over with, so I bit the bullet and got a new home machine (and then I wound up with questions on its graphics).
(There will be better CPUs in the future that probably get real performance boosts from hardware fixes for Meltdown and Spectre, but there are always better CPUs in the future. And on my home machine, I'm willing to run with the kernel mitigations turned off if I feel strongly enough about it.)
The parts list for my new machine is substantially the same as my initial plans, but I made a few changes and indulgences. Here's the parts list for my future reference, if nothing else:
- Intel Core i7-8700K
- This is the big change from my initial plans,
where I'd picked the i7-8700. It's an indulgence, but I felt like
it and using an overclocking capable CPU did eliminate most of
my concerns over high-speed DDR4 memory.
I'm not overclocking the 8700K because, well, I don't. After I
didn't run into any TDP issues with my work machine
and its 95W TDP Ryzen, I decided that I didn't have any concerns
with the 8700K's 95W TDP either.
(My concerns about the TDP difference between the 8700K and the 8700 turn out to be overblown, but that's going to be another entry.)
- Asus PRIME Z370-A motherboard
- I saw no reason to change my initial
choice (and it turns out Phoronix was quite positive about it
which I only found out about later). The RGB LEDs are kind of
amusing and they even show a little bit through the some air
vents on the case (especially in the dark).
- 2x 16GB G.Skill Ripjaws V DDR4-3000 CL15 RAM
- I can't remember if
the G.Skill DDR4-2666 modules were out of stock at the exact point
I was putting in my order, but they certainly had been earlier
(when I assembled the parts list at the vendor I was going to
use), and in the end I decided I was willing to pay the slightly
higher cost of DDR4-3000 RAM just as an indulgence.
(Looking at the vendor I used, the current price difference is $4 Canadian.)
I opted not to try to go faster than DDR4-3000 because the RAM modules I could easily see stopped having flat CL timings above that speed; they would be eg CL 15-18-18-18 instead of a flat CL15. There were some faster modules with flat CL timings, but they were much more expensive. Since I think I care more about latency than slight bandwidth increases, I decided to stick with the inexpensive option that gave me flat latency.
- Noctua NH-U12 CPU cooler
- I quite liked the Noctua cooler in
my work machine, so I just got the general
version of it. I've been quite happy on my work machine not just
with how well the Noctua cooler works, but also how quiet it is
with the CPU under heavy load (and in fact it seems quite difficult
to push the CPU hard enough that the Noctua's fan has to run very
fast, which of course helps keep any noise down).
In short: real (and large-ish) CPU coolers are a lot better than the old stock Intel CPU coolers. I suppose this is not surprising.
(The Noctua is not inexpensive, but I'm willing to pay for quality and long term reliability.)
- EVGA SuperNova G3 550W PSU
- As with my work machine, the
commentary on my original plans pushed me to
getting a better PSU. Since I liked this in the work machine, I
just got it again for the home one. It's an amusing degree of
overkill for the actual measured power draw of either machine,
but I don't really care about that.
(For my own future reference: always use the PSU screws that come with the PSU, not the PSU screws that may come with the case.)
- Fractal Design Define R5 case
- I got this in white this time
around (the work version is black). The story about the colour
shift is that we recently built a new Ryzen based machine for one
of my co-workers and basically duplicated my work machine for
the parts list, except his case is white because he didn't care
and it was cheaper and in stock at the time. When
his case came in and I got a chance to see it in person, I decided
that maybe I liked the white version better than the black one
and in the end my vacillation settled on the white one.
(When I pulled the trigger on buying the hardware, the two case colours were the same price (and both in stock). So it goes.)
- LG GH24NSC0 DVD/CD Writer
- I gave in to an emotional temptation
beyond the scope of this entry and got an optical drive, despite
what I'd written earlier. But
I haven't actually installed it in the case; it's apparently
enough that I could if I wanted to.
(It helps that powering the drive would be somewhat awkward, because the PSU doesn't come with SATA modular cables that are long enough or have enough plugs on them.)
I'm running the RAM at full speed by activating its XMP profile in the BIOS, which was a pleasant one-click option (without that it came up at a listed 2133 Mhz). There didn't seem to be a one-click option for 'run this at 2666 MHz with the best timings it supports', so I didn't bother trying to set that up by hand. The result appears stable and the BIOS at least claims that the CPU is still running at its official normal speed rating.
(Apparently it's common for Intel motherboards to default to running DDR4 RAM at 2133 MHz for maximum compatibility or something.)
In general I'm quite happy with this new machine. It's not perfect, but nothing seems to be under Linux, but everything works, it's pretty nice, and it's surprisingly fast. Honestly, it feels good simply to have a modern machine, especially one where I can't clearly hear the difference between an idle machine and one under CPU load.
(There's still a little bit of noise increase under full load, but it's pretty subtle and I have to really pay attention in a quiet environment to hear it. On my old machine, the stock Intel CPU cooler spinning up created a clearly audible difference in noise that I could hear over, eg, my typing. The old machine might have been improved by redoing the thermal paste, but my old work machine, with the redone paste, still had the same sort of audibility under load.)
After my experiences with UEFI on my work machine, I didn't try to switch from BIOS booting to UEFI on this one either (contrary to earlier plans for this). The two machines probably have somewhat different BIOSes (although their GUIs look very similar), but I didn't feel like taking the chance and I've wound up feeling that BIOS boot is better if you're using a mirrored root filesystem (which I am), fundamentally due to an unfortunate Linux (or Fedora) GRUB2 configuration choice.
PS: Given my experiences with Ryzen on my new work machine (eg, and), I wound up with absolutely no interest in going with a Ryzen home machine. This turned out to be a good choice in general, but that's another entry.
(The short version is on Twitter.)
The problem with Linux's 'predictable network interface names'
I find modern Linux auto-generated Ethernet device names to be a big pain, because they're such long jumbles. enp0s31f6? enp1s0f0? Please give me a break (and something short).
The fundamental problem with these 'predictable network interface
is that they aren't. By that I mean that if you tell me that a
system has, say, a motherboard 1G Ethernet port and a 10G-T Ethernet
port on a PCIE card, I can't predict what those interfaces will be
called unless I happen to have an exactly identical second machine
that I can check. If I want to configure the machine's networking
or ssh in to run a status check command on the 10G-T interface, I'm
pretty much out of luck; I'm going to have to run
some similar command to see what this machine has decided to call
(Yes, even for the motherboard network port, which may or may not
show up as
eno1 depending on the vagaries of life and your
specific configuration. The
from my tweet is such a port, and we have other machines where the
motherboard port is, eg,
The other problem with these names is that they're relatively long jumbles that are different from each other at various random positions (not just at the end). Names like this are hard to tell apart, hard to tell to people, and invite errors when you're working with them (because such errors won't stand out in the jumble). This might be tolerable if we were getting predictability in exchange for that jumble, but we aren't. If all we're going to get is stability, it would be nice to have names that are easier to deal with.
(We aren't even entirely getting stability, since PCI slot numbering isn't stable and that's what these names are based on.)
PS: There is a benefit to this naming scheme, which is that
identical hardware will have identical names and you can freely
transplant system disks (or system images) around between such
hardware. If I have a fleet of truly identical machines (down to
the PCIE cards being in the same physical slots), I know that
enp1s0f0 on one machine is
enp1s0f0 on every machine.
(Over the years, these device names have been implemented with somewhat different names by a number of different Linux components (eg). These days they come from udev, which is now developed as part of systemd in case you wish to throw the usual stones. I'm not sure if udev considers the specific naming scheme to be stable, considering the official documentation points you to the source code.)
Sidebar: What this scheme does give us
Given identical and unchanging hardware (and BIOS), we get names that are consistent from boot to boot, from machine to machine, and are 'stateless' in that they don't depend on the past history of the Linux install running on the machine (your five year old install that's been been moved between three machines sees the same names as a from-scratch install made yesterday).
My uncertainties around X drivers for modern Intel integrated graphics
When I switched from my old office hardware to my new office machine, I got surprised by how well the X server coped with not actually having a hardware driver for my graphics card. That experience left me jumpy about making sure that any new hardware I used was actually being driven properly by X, instead of X falling back to something that was just good enough that I didn't know I should be getting better. All of which is the lead up to me switching my home machine over to new modern Intel-based hardware, using Intel's generally well regarded onboard graphics. Everything looked good, but then I decided to find out if it actually was good, and now I'm confused.
(The specific hardware is only slightly different from my original plans.)
Let's start with the first issue, which is how to determine what X
modules you're actually using. People who are expert in reading the
X log files can probably work this out from the logs but it left
me confused and puzzled, so I've now resorted to brute force. X
server modules are shared libraries (well, shared objects), so the
X server has to map them into its address space in order to use
them. If we assume that it unmaps modules it's not using, we can
lsof to determine what current modules it has. On my machine,
this reports that the driver it has loaded is the
libglamoregl.so (and various
Mesa things). Staring at the X logs led me to see:
modeset(0): [DRI2] Setup complete modeset(0): [DRI2] DRI driver: i965 modeset(0): [DRI2] VDPAU driver: va_gl [...] AIGLX: Loaded and initialized i965
That seems pretty definite; I'm using the modesetting driver, not the Intel driver. This raises another question, which is whether or not this is a good thing. Although I initially thought there might be problems, a bunch of research in the process of writing this entry suggests that using the modesetting driver is the right answer (eg the Arch wiki entry on Intel graphics, which led me to the announcement that Fedora was switching over to modesetting). In fact now that I look back at my earlier entry, a commentator even talked about this switch.
(Before I found this information, I tried forcing X to use the Intel driver. This sort of worked (with complaints about an unrecognized chipset), but Chrome didn't like life so I gave up pretty much immediately. This might be fixed in the latest git tree of the driver, but if modesetting X works and is preferred, my motivation for building the driver from source is low.)
Unfortunately this leaves me with questions and concerns that I
don't have answers to. The first issue is that I don't know how
much GPU accelerated OpenGL I have active in this configuration.
Since I can run some OpenGL stress tests without seeing the CPU
load go up very much, I'm probably not using much software rendering
and it's mostly hardware. Certainly
xdriinfo reports that I have
DRI through an i965 driver and
glxinfo seems to know about the
glxinfo reports in several sections that it's using
'Mesa DRI Intel(R) HD Graphics (Coffeelake 3x8 GT2) (0x3e92)'. The
0x3e92 matches the PCI ID of the integrated graphics controller.)
The second issue is video playback, which for reasons beyond the
scope of this entry is one of my interests. One of the ways that I
noticed problems the first time around
vdpauinfo was completely unhappy. This time around it's
partially unhappy, but it says its using the 'OpenGL/VAAPI backend
for VDPAU'. VAAPI
(also) is an
Intel project, and it has its own information command,
vainfo is not happy with my system; if I can read
its messages correctly, it's unable to initialize the hardware
driver it wants to use. Of course I don't know whether this even
matters for basic video playback, even of 1080p content, or if
common Linux video players use VAAPI (or VDPAU).
(Part of the issue may be that I have a bit of a Frankenstein mess of packages, with some VAAPI related packages coming from RPM Fusion, including the libva Intel driver package itself. Possibly I should trim down those packages somewhat.)
xprt: data for NFS mounts in
/proc/self/mountstats is per-fileserver, not per-mount
A while back I wrote about all of the handy NFS statistics that
mountstats for all of your NFS
mounts, including the
xprt: NFS RPC information.
For TCP mounts, this includes the local port and at the time I said:
port: The local port used for this particular NFS mount. Probably not particularly useful, since on our NFS clients all NFS mounts from the same fileserver use the same port (and thus the same underlying TCP connection).
I then blithely talked about all of the remaining statistics as if they were specific to the particular NFS mount that the line was for. This turns out to be wrong, and the port number is in fact vital. I can demonstrate how vital by a little exercise:
$ fgrep xprt: /proc/self/mountstats | sort | uniq -c | sort -nr 105 xprt: tcp 903 1 1 0 62 97817460 97785284 11122 2101962256388 0 574 10700678 55890249 82 xprt: tcp 1005 1 1 0 0 48538448 48536496 1788 48292655827 0 810 26226830 53362451 [...]
It's not a coincidence that we have 105 NFS filesystems mounted
from one fileserver and 82 from another. It turns out that at least
with TCP based NFS mounts, all NFS mounts from the same fileserver
will normally share the same RPC xprt transport, and it is the
xprt transport's statistics that are being reported here. As a
result, all of that
xprt: NFS RPC information is for all NFS
RPC traffic to the entire fileserver, not just the NFS RPC traffic
for this specific mount.
(For TCP mounts, the combination of the local port plus the
mountaddr= IP address will identify which xprt transport a given
NFS mount is using. On our systems all NFS mounts from a given
fileserver use the same port
and thus the same xprt transport, but this may not always be the
case. Also, each different fileserver is using a different local
port, but again I'm not sure this is guaranteed.)
If the system is sufficiently busy doing NFS (and has enough NFS
mounts), it's possible to see slightly different
xprt: values for
different mounts from a given fileserver that are using the same
xprt transport. This isn't a true difference; it's just an artifact
of the fact that the information for
mountstats isn't being
gathered all at once. If things update sufficiently frequently and
fast, an early mount will report slightly older
xprt: values than
a later mount.
If you want to get a global view of RPC to a given fileserver, this
is potentially convenient. If you want to get a per-mount view,
it's inconvenient. For instance, to get the total number of NFS
requests sent by this mount or the total bytes sent and received
by it, you can't just look at the
xprt: stats; instead you'll
need to add up the counts from the per-operation statistics. Much of the information you want can be
found by summing up per-operation stats this way, but I haven't
checked to see if all of it can be.
There are probably clever things that can be done by combining
and contrasting the
xprt global stats and the per-mount stats
you can calculate. I haven't tried to wrangle those metrics yet, though.
PS: The way that I found this is that the current version of
does its sorting for
-s based on the
xprt: statistics, which
gave us results that were sufficiently drastically off that it was
obvious something was wrong.
What I think I want out of a hypothetical
nfsiotop for Linux
I wish there was a version of Linux's nfsiostat that worked gracefully when you have several hundred NFS mounts across multiple NFS fileservers.
(I'm going to have to write one, aren't I.)
Linux exposes a very large array of per-filesystem NFS client
/proc/self/mountstats (see here)
and there are some programs that digest this data and report it,
such as nfsiostat(8). Nfsiostat
generally works decently to give you useful information, but it's
very much not designed for systems with, for example, over 250 NFS
mounts. Unfortunately that describes us, and we would rather like
to have a took which tells us what the NFS filesystem hotspots are
on a given NFS client if and when it's clearly spending a lot of
time waiting for NFS IO.
As suggested by the name, a hypothetical
nfsiotop would have to
only report on the top N filesystems, which raises the question of
how you sort NFS filesystems here. Modern versions of nfsiostat
sort by operations per second, which is a start, but I think that one
should also be able to sort by total read and write volume and
probably also by write volume alone. Other likely interesting things
to sort on are the average response time and the current number of
operations outstanding. An ideal tool would also be able to aggregate
things into per fileserver statistics.
(All of this suggests that the real answer is that you should be able to sort on any field that the program can display, including some synthetic ones.)
As my aside in the tweet suggests, I suspect that I'm going to have
to write this myself, and probably mostly from scratch. While
nfsiostat is written in Python and so is probably reasonably
straightforward for me to modify, I suspect that it has too many
things I'd want to change. I don't want little tweaks for things
like its output, I want wholesale restructuring. Hopefully I can
reuse its code to parse the
mountstats file, since that seems
reasonably tedious to write from scratch. On the other hand, the
current nfsiostat Python code seems amenable to a quick gut job to
prototype the output that I'd want.
(Mind you, prototypes tend to drift into use. But that's not necessarily a bad thing.)
PS: I've also run across kofemann/nfstop, which has some interesting features such as a per-UID breakdown, but it works by capturing NFS network traffic and that's not the kind of thing I want to have to use on a busy machine, especially at 10G.
PPS: I'd love to find out that a plausible nfsiotop already exists, but I haven't been able to turn one up in Internet searches so far.
In Fedora, your initramfs contains a copy of your sysctl settings
It all started when I discovered that my office workstation had
wound up with its maximum PID value set to a very large number (as
mentioned in passing in this entry). I managed
to track this down to a sysctl.d
file from Fedora's
ceph-osd RPM package, which I had installed
for reasons that are not entirely clear to me. That was straightforward.
So I removed the package, along with all of the other ceph packages,
and rebooted for other reasons. To my surprise, this didn't
change the setting;
I still had a
kernel.pid_max value of 4194304. A bunch of head
scratching ensued, including extreme measures like downloading and
checking the Fedora systemd source. In the end,
the culprit turned out to be my initramfs.
In Fedora, dracut copies sysctl.d files into your initramfs when it builds one (generally when you install a kernel update), and there's nothing that forces an update or rebuild of your initramfs when something modifies what sysctl.d files the system has or what they contain. Normally this is relatively harmless; you will have sysctl settings applied in the initramfs and then reapplied when sysctl runs a second time as the system is booting from your root filesystem. If you added new sysctl.d files or settings, they won't be in the initramfs but they'll get set the second time around. If you changed sysctl settings, the initramfs versions of the sysctl.d files will set the old values but then your updated settings will get set the second time around. But if you removed settings, nothing can fix that up; the old initramfs version of your sysctl.d file will apply the setting, and nothing will override it later.
(In Fedora 27's Dracut, this is done by a core systemd related Dracut module in /usr/lib/dracut/modules.d, 00systemd/module-setup.sh.)
It's my view that this behavior is dangerous. As this incident and others have demonstrated, any time that normal system files get copied into initramfs, you have the chance that the live versions will get out of sync with the versions in initramfs and then you can have explosions. The direct consequence of this is that you should strive to put as little in initramfs as possible, in order to minimize the chances of problems and confusion. Putting a frozen copy of sysctl.d files into the initramfs is not doing this. If there are sysctl settings that have to be applied in order to boot the system, they should be in a separate, clearly marked area and only that area should go in the initramfs.
(However, our Ubuntu 16.04 machines don't have sysctl.d files in their initramfs, so this behavior isn't universal and probably isn't required by either systemd or booting in general.)
Since that's not likely to happen any time soon, I guess I'm just going to have to remember to rebuild my initramfs any time I remove a sysctl setting. More broadly, I should probably adopt a habit of preemptively rebuilding my initramfs any time something inexplicable is going on, because that might be where the problem is. Or at least I should check what the initramfs contains, just in case Fedora's dracut setup has decided to captured something.
(It's my opinion that another sign that this is a bad idea in general is there's no obvious package to file a bug against. Who is at fault? As far as I know there's no mechanism in RPM to trigger an action when files in a certain path are added, removed, or modified, and anyway you don't necessarily want to rebuild an initramfs by surprise.)
PS: For extra fun you actually have multiple initramfses; you have one per installed kernel. Normally this doesn't matter because you're only using the latest kernel and thus the latest initramfs, but if you have to boot an earlier kernel for some reason the files captured in its initramfs may be even more out of date than you expect.
The lie in Ubuntu source packages (and probably Debian ones as well)
One of the things that pisses me off about the Debian and Ubuntu source package format is that people clearly do not actually use it to build packages; they use other tools. You can tell because of how things are broken.
(I may have been hasty in tarring Debian with this particular brush but it definitely applies to Ubuntu.)
Several years ago I wrote about one problem with how Debian builds from source packages, which is that it doesn't have a distinction between the package's source tree and the tree that the package is built in and as a result building the package can contaminate the source tree. This is not just a theoretical concern; it's happened to us. In fact it's now happened with both the Ubuntu 14.04 version of the package and then the Ubuntu 16.04 version, which was contaminated in a different way this time.
This problem is not difficult to find or notice. All you have to
do is run
debuild twice in the package's source tree and the
second one will error out. People who are developing and testing
package changes should be doing this all the time, as they build
and test scratch versions of their package to make sure that it
actually has what they want, passes package lint checks, and so on.
Ubuntu didn't find this issue, or if they found it they didn't care
enough to fix it. The conclusion is inescapable; the source package and
all of the documentation that tells you to use
debuild on it is a
lie. The nominal source package may contain the source code that went
into the binary package (although I'm not sure you can be sure of that),
but it's not necessarily an honest representation of how the package is
actually built by the people who work on it and as a result building the
debuild may or may not reproduce the binary package you
got from Ubuntu. Certainly you can't reliably use the source package to
develop new versions of the binary package; one way or another, you will
have to use some sort of hack workaround.
(RPM based distributions should not feel too smug here, because they have their own package building issues and documentation problems.)
I don't build many Ubuntu packages. That I've stumbled over two packages out of the few that I've tried to rebuild and they're broken in two different ways strongly suggests to me that this is pretty common. I could be unlucky (or lucky), but I think it's more likely that I'm getting a reasonably representative random sample.
PS: If Ubuntu and/or Debian care about this, the solution is obvious, although it will slow things down somewhat. As always, if you really care about something you must test it and if you don't bother to test it when it's demonstrably a problem, you probably don't actually care about it. This is not a difficult test to automate.
debuild is not what people should be using to build or
rebuild packages these days, various people have at least a
Getting chrony to not try to use IPv6 time sources on Fedora
Ever since I switched over to chrony,
one of the quiet little irritations of its setup on my office
workstation has been that it tried to use IPv6 time sources along
side the IPv4 ones. It got these time sources from the default
Fedora pool I'd left it using along side our local time sources
(because I'm the kind of person who thinks the more time sources
the merrier), and at one level looking up IPv6 addresses as well
as IPv4 addresses is perfectly sensible. At another level, though,
it wasn't, because my office workstation has no IPv6 connectivity
and even no IPv6 configuration. All of those IPv6 time sources that
chrony was trying to talk to were completely unreachable and would
never work. At a minimum they were clutter in '
output, but probably they were also keeping chrony from picking up
some additional IPv4 sources.
I started out by reading the
chrony.conf manpage, on
the assumption that that would be where you configured this.
When I found nothing, I unwisely gave up and grumbled to myself,
eventually saying something on Twitter. This
caused @rt2800pci1 to suggest using systemd restrictions so
chronyd couldn't even use IPv6. This had some interesting
results. On the one hand,
chronyd definitely couldn't use IPv6
and it said as much:
chronyd: Could not open IPv6 command socket : Address family not supported by protocol
On the other hand, this didn't stop
chronyd from trying to use
IPv6 addresses as time sources:
chronyd: Source 2620:10a:800f::14 replaced with 2620:10a:800f::11
(I don't know why my office workstation has such high PIDs at the moment. Something odd is clearly going on.)
However, this failure caused me to actually read the
I finally noticed the
-4 command line option, which tells chrony
to only use IPv4 addresses for everything. On Fedora, you can
configure what options are given to
which is automatically used by the standard Fedora
systemd service for chrony(d). A quick addition and chrony restart,
and now it's not trying to use IPv6 and I'm happy.
There are a number of lessons here. One of them is my perpetual one,
which is that I should read the manual pages more often (and make
sure I read all of them). There was no reason to stop with just the
chronyd.conf manpage; I simply assumed that not using IPv6 would be
configured there if it was configurable at all. I was wrong and I could
had my annoyance fixed quite a while ago if I'd looked harder.
Another one, on the flipside, is that completely disabling IPv6 doesn't necessarily stop modern programs from trying to use it. Perhaps this is a bug on chrony's part, but I suspect that its authors will be uninterested in fixing it. It's likely becoming a de facto standard that Linux systems have IPv6 enabled, even if they don't have it configured and can't reach anything with it. Someday we're going to see daemons that bind themselves only to the IPv6 localhost, not the IPv4 one.