Some notes on systemd-resolved, the systemd DNS resolver
My office workstation's upgrade to Fedora 27 resulted in a little incident with NetworkManager, which I complained about on Twitter; the resulting Twitter conversation brought systemd-resolved to my attention. My initial views weren't all that positive (because I'm biased here; systemd's recent inventions have often not been good things) but I didn't fully understand its state on my systems, so I wound up doing some digging. I'm still not too enthused, but I've wound up less grumpy than I was before and I'm not going to be forcefully blocking systemd-resolved from running at all just yet.
Systemd-resolved is systemd's DNS resolver. It has three interfaces:
- A DBUS API that's exposed at /org/freedesktop/resolve1. I don't know
how many things use this API (or at least try to use it).
- A local caching DNS resolver at 127.0.0.53 (IPv4 only) that clients
can query to specifically talk to systemd-resolved, even if you have
another local caching DNS server at 127.0.0.1.
getaddrinfo()and friends, which would send all normal hostname lookups off to systemd-resolved. Importantly, this is sanely implemented as a NSS module. If you don't have
/etc/nsswitch.conf, systemd-resolved is not involved in normal hostname resolution.
All of my Fedora machines have systemd-resolved installed as part
of systemd but none of them appear to have the NSS
enabled, so none of them are using systemd-resolved as part of
normal hostname resolution. They do appear to enable the DBus service
(as far as I can sort out the chain of DBus stuff that leads to
unit activation). The systemd-resolved daemon itself is not normally
running, and there doesn't seem to be any systemd socket stuff that
would activate it if you sent a DNS query to port 53 on 127.0.0.53,
so on my Fedora machines it appears the only way it will ever start
is if something makes an explicit DBus query.
However, once activated resolved has some behaviors that I don't think I'm fond of (apart from the security bugs and the regular bugs). I'm especially not enthused about its default use of LLMNR, which will normally see it broadcasting certain DNS queries out on all of my active interfaces. I consider LLMNR somewhere between useless and an active risk of various things, depending on what sort of network I'm connected to.
Resolved will make queries to DNS servers in parallel if you have
more than one of them available through various paths, but here I
think it's a reasonable approach to handling DNS resolution in the
face of things like VPNs, which otherwise sort of requires awkward
hand configuration. It's unfortunate that this
behavior can harm people who know what they're doing and who want
behavior like their local DNS resolver (or
resolv.conf) to always
override the DNS resolver settings they're getting from some random
Since resolved doesn't actually shove itself in the way of anyone who didn't actively ask for it (via DBus or querying 127.0.0.53), I currently feel it's unobjectionable enough to leave unmasked and thus potentially activated via DBus. Assuming that I'm understanding and using journalctl correctly, it never seems to have been activated on either of my primary Fedora machines (and they have journalctl logs that go a long way back).
My upgrade to Fedora 27, Secure Boot, and a mistake made somewhere
I'm usually slow about updating to new versions of Fedora; I like to let other people find the problems and then it's generally a hassle in various ways, so I keep putting it off. This week I decided that I'd been sitting on the Fedora 27 upgrade for long enough (or too long), and today it was the turn of my work laptop. It didn't entirely go well, but after the dust settled I think it's due to an innocent looking mistake I made and my specific laptop configuration.
This is a new laptop, a Dell XPS 13, and this is the first Fedora
upgrade I've done on it (I installed Fedora 26 when we got it in
mid-August). As I usually do, I did the Fedora 26 to 27 upgrade
with the officially unsupported method of a live upgrade with
based on the traditional documentation for it,
which I've been doing on multiple machines for many years. After I
finished the upgrade process, I rebooted and the laptop failed to
come up in Linux; instead it booted into the Windows 10 installation
that I have on the other half of its drive. My Linux install (now
with Fedora 27) was intact, but it wouldn't boot at all.
I will start with the summary. If your system boots using UEFI,
you almost certainly shouldn't ever run
portions of the Fedora wiki (like the Fedora page on Grub2) will tell you this pretty
loudly, but the 'upgrade with package manager' page
still says to use
grub2-install without any qualifications, and that's
what I did during my Fedora 27 upgrade.
What caused my issue is that I have Secure Boot
enabled on my laptop, and at some point during the upgrade my Fedora
UEFI boot entry wound up pointing to the EFI image image
EFI/fedora/grubx64.efi, which isn't correctly signed and so won't
boot under Secure Boot. The XPS UEFI firmware doesn't report any
error message when this happens; instead it silently goes on to the
next UEFI boot entry (if there is one), which in my case was Windows'
In order to boot my laptop with Secure Boot enabled, the UEFI boot
entry for Fedora 27 needs to point to
grubx64.efi. This shim loader is signed and passes the UEFI
firmware's Secure Boot verification, and once it starts it hands
things off to
grubx64.efi for regular GRUB2 UEFI booting.
(If I disabled Secure Boot, I could use the
boot entry. Otherwise, only the
shimx64.efi entry worked.)
At this point I don't know what my Fedora 26 UEFI boot entry looked
like, but I suspect that it pointed to the Fedora 26 version of the
shim (which appears to be called
EFI/fedora/shim.efi). My best
guess for what happened during my Fedora 27 upgrade is that when I
grub2-install at the end, one of the things it did was
run efibootmgr and reset
where the 'fedora' UEFI boot entry pointed. I don't remember seeing
any message reporting this, but I didn't run
any flag to make it verbose and the code to run efibootmgr
appears to be in the Grub2 source.
(And changing the UEFI boot entry is sort of reasonable. After all,
I told Grub2 to install itself, and that logically includes making
the UEFI boot entry point to it, just as
grub2-install on a
non-UEFI system will update the MBR boot record to point to itself.)
PS: I consider all of this a valuable learning experience, since I got to shoot myself in the foot and learn a bunch of things about UEFI on a machine I could live without. I'm planning to set up my future desktops as pure UEFI machines, and making this mistake on one of them would have been much more painful. For that matter, simply knowing how to set up UEFI boot entries is going to come in handy when I migrate my current disks over to the new machines.
(I'm up in the air about whether or not I'll use Secure Boot on the desktops. If they come that way, well, maybe.)
Sidebar: How I fixed this
In theory you can boot a Fedora 27 live image from a USB stick and
fiddle around with
efibootmgr. In practice, I went in to the
laptop's UEFI 'BIOS' interface and told it to add another UEFI boot
entry, because this had a reasonably simple and obvious interface.
The resulting entry is a bit different from what I think
would make, but it works (as well it should, since it was set up by
the very thing that's interpreting it).
(In the course of this experience I was not pleased to discover that the Dell XPS 13's UEFI interface will let you delete UEFI boot entries with immediate effect and no confirmation or saving needed. Click the wrong button at the wrong time, and your entry is irretrievably gone on the spot.)
My new Linux office workstation for fall 2017
My past two generations of office Linux desktops have been identical to my home machines, and when I wrote up my planned new home machine I expected that to be the case for my next work machine as well (we have some spare money and my work machine is six years old, so replacing it was always in the plans). It turns out that this is not going to be the case this time around; to my surprise and for reasons beyond the scope of this entry, my next office machine is going to be AMD Ryzen based.
The definitive parts list for this machine is as follows. Much of it is based on my planned new home machine, but obviously the switch from Intel to AMD required some other changes, some of which are irritating ones.
- AMD Ryzen 1800X
- Even though we're not going to overclock it, this
is still the best Ryzen CPU. I figure that I can live with the 95W
TDP and the cooling it requires, since that's what my current
desktop has (and this time I'm getting a better CPU cooler than the
stock Intel one, so it should run both cooler and quieter).
- ASUS Prime X370-Pro motherboard
- We recently got another
Ryzen-based machine with this motherboard and it seems fine (as
a CPU/GPU compute server). The motherboard has a decent assortment
of SATA ports, USB, and so on, and really there's not much to say
about it. I also looked at the slightly less expensive X370-A,
but the X370-Pro has more than enough improvements to strongly
prefer it (including two more SATA ports and onboard Intel-based
networking instead of Realtek-based).
It does come with built in colourful LED lighting, which looks a bit odd in the machine in our server room. I'll live with it.
(This motherboard is mostly an improvement on the Intel version since it has more SATA ports, although I believe it has one less M.2 NVME port. But with two x16 PCIE slots, you can fix that with an add-on card.)
- 2x16 GB DDR4-2400 Kingston ECC ValueRAM
- Two DIMMs is what you
want on Ryzens today. We're using
ECC RAM basically because we can; it's available and is only a
bit more expensive than non-ECC RAM, runs fast enough, and is
supported to at least some degree by the motherboard. We don't know
if it will correct any errors, but probably it will.
(You can't get single-rank 16GB DIMMs, so that this ECC RAM is double-rank is not a drawback.)
The RAM speed issues with Ryzen is one of the irritations of building this machine around an AMD CPU instead of an Intel one. It may never be upgraded to 64 GB RAM over its lifetime (which will probably be at least five years).
- Noctua NH-U12-SE-AM4 CPU cooler
- We need some cooler for the
Ryzen 1800X (since it doesn't come with one). These are well
reviewed as both effective and quiet, and the first Ryzen machine
we got has a Noctua cooler as well (although a different one).
- Gigabyte Radeon RX 550 2GB video card
- That I need a graphics
card is one of the irritations of Ryzens. Needing a discrete
graphics card means an AMD/ATI card right now, and I wanted one
with a reasonably modern graphics architecture (and I needed one
with at least two digital video outputs, since I have dual
monitors). I sort of threw darts here, but reviewers seem to say
that this card should be quiet under normal use.
As a Linux user I don't normally stress my graphics, but I expect to have to run Wayland by the end of the lifetime of this machine and I suspect that it will want something better than a vintage 2011 chipset. A modern Intel integrated GPU would likely have been fine, but Ryzens don't have integrated graphics so I have to go with a separate card.
(The Prime X370-Pro has onboard HDMI and DisplayPort connectors, but a footnote in the specifications notes that they only do anything if you have an Athlon CPU with integrated graphics. This disappointed me when I read it carefully, because at first I thought I was going to get to skip a separate video card.)
- EVGA SuperNOVA G3 550W PSU
- Commentary on my planned home
machine pushed me to a better PSU than I
initially put in that machine's parts list. Going to 550W
buys me some margin for increased power needs for things
like a more powerful GPU, if I ever need it.
(There are vaguely plausible reasons I might want to temporarily put in a GPU capable of running things like CUDA or Tensorflow. Some day we may need to know more about them than we currently do, since our researchers are increasingly interested in GPU computing.)
- Fractal Design Define R5 case
- All of the reasons I originally
had for my home machine apply just as much for
my work machine. I'm actively looking forward to having enough
drive bays (and SATA ports) to temporarily throw hard drives into
my case for testing purposes.
- LG GH24NSC0 DVD/CD Writer
- This is an indulgence, but it's an inexpensive one, I do actually burn DVDs at work every so often, and the motherboard has 8 SATA ports so I can actually connect this up all the time.
Unlike my still-theoretical new home machine (which is now unlikely to materialize before the start of next year at the earliest), the parts for my new office machine have all been ordered, so this is final. We're going to assemble it ourselves (by which I mean that I'm going to, possibly with some assistance from my co-workers if I run into problems).
On the bright side of not doing anything about a new home machine, now I'm going to get experience with a bunch of the parts I was planning to use in it (and with assembling a modern PC). If I decide I dislike the case or whatever for some reason, well, now I can look for another one.
(However, there's not much chance that I'll change my mind on using an Intel CPU in my new home machine even if this AMD-based one goes well. The 1800X is a more expensive CPU, although not as much so as I was expecting, and then there's the need for a GPU and the whole issues with memory and so on. Plus I remain more interested in single-thread CPU performance in my home usage. Still, I could wind up surprising myself here, especially if ECC turns out to be genuinely useful. Genuinely useful ECC would be a bit disturbing, of course, since that implies that I'd be seeing single-bit RAM errors far more than I think I should be.)
We're broadly switching to synchronizing time with systemd's timesyncd
Every so often, simply writing an entry causes me to take a closer look
at something I hadn't paid much attention to before. I recently wrote
a series of entries on my switch from ntpd to chrony on my desktops and why we don't run NTP daemons but instead
synchronize time through a cron entry.
Our hourly crontab script for time synchronization dates back to at
least 2008 and perhaps as early as 2006 and our first Ubuntu 6.06
installs; we've been carrying it forward ever since without thinking
about it very much. In particular, we carried it forward into our
standard 16.04 installs. When we did this,
we didn't really pay attention to the fact that 16.04 is different
here, because 16.04 is systemd based and includes systemd's timesyncd
time synchronization system. Ubuntu installed and activated
systemd-timesyncd (with a stock setup that got time from
ntp.ubuntu.com), we installed our hourly crontab script, and nothing
exploded so we didn't really pay attention to any of this.
When I wrote my entries, they caused me to start actually noticing
systemd-timesyncd and paying some attention to it, which included
noticing that it was actually running and synchronizing the time
on our servers (which kind of invalidates my casual claim here that our servers were typically less than
a millisecond out in an hour, since that was based on
reports and I was assuming that there was no other time synchronization
going on). Coincidentally, one of my co-workers had also had timesyncd
come to his attention recently for reasons outside of the scope of
this entry. With timesyncd temporarily in our awareness, my
co-workers and I talked over the whole issue and decided that doing
time synchronization the official 16.04 systemd way made the most
(Part of it is that we're likely to run into this issue on all future Linuxes we deal with, because systemd is everywhere. CentOS 7 appears to be just a bit too old to have timesyncd, but a future CentOS 8 very likely will, and of course Ubuntu 18.04 will and so on. We could fight city hall, but at a certain point it's less effort to go with the flow.)
In other words, we're switching over to officially using systemd-timesyncd. We were passively using it before without really realizing it since we didn't disable timesyncd, but now we're actively configuring it to use our time local servers instead of Ubuntu's and we're disabling and removing our hourly cron job. I guess we're now running NTP daemons on all our servers after all; not because we need them for any of the reasons I listed, but just because it's the easiest way.
(At the moment we're also using
/etc/default/ntpdate (from the Ubuntu
ntpdate package) to force an initial synchronization at boot time,
or technically when the interface comes up. We'll probably keep doing
this unless timesyncd picks up good explicit support for initially
force-setting the system time; when our machines boot and get on the
network, we want them to immediately jump their time to whatever we
currently think it is.)
One way of capturing debugging state information in a systemd-based system
Suppose, not entirely hypothetically, that you have a systemd
.service unit running something where the something (whatever it
is) is mysteriously failing to start or run properly. In the most
frustrating version of this, you can run the operation just fine
after the system finishes booting and you can log in, but it fails
during boot and you can't see why. In this situation you often want
to gather information about the boot-time state of the system just
before your daemon or program is started and fails; you might need
to know things like what devices are available, the state of network
interfaces and routes, what filesystems have been mounted, what
other things are already running, and so on.
All of this information can be gathered by a shell script, but the
slightly tricky bit is figuring out how to get it to run. I've taken
two approaches here. The first one is to simply write a new
[Unit] Description=Debug stuff After=<whatever> Before=<whatever else> [Service] Type=oneshot RemainAfterExit=True ExecStart=/root/gather-info [Install] WantedBy=multi-user.target
Here the actual information gathering script is
I typically have it write its data into a file in
/root as well.
/root as a handy dumping ground that's on the root filesystem
but not conceptually owned by the package manager in the way that
/bin, and so on are; I can throw things in there without
worrying that I'm causing (much) future problems.
(If you use an
ExecStop= instead of
ExecStart= you can gather
the same sort of information at shutdown.)
However, if you're interested in the state basically right before
.service runs, the better approach is to modify that
.service to add an extra
ExecStartPre= line. In order to make
sure I know what's going on, my approach is to copy the entire
.service file to
/etc/systemd/system (if necessary) and then
edit it. As an example, suppose that your ZFS on Linux setup is
failing to import pools on boot because the
unit is failing.
Here I'd modify the
.service like this:
ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN
Unfortunately I don't think you can do this without copying the
.service file, or at least I wouldn't want to trust it
any other way.
Possibly there's a better way to do this in the systemd world, but
I've been sort of frustrated by how difficult it is to do various
things here. For example, it would be nice if systemd would easily
give you the names of systemd units that ran or failed, instead of
Description= texts. More than once I've had to resort to
grep -rl <whatever> /usr/lib/systemd/system' in an attempt to
find a unit file so I could see what it actually did.
Sidebar: My usual general format for information-gathering scripts
I tend to write them like this:
#!/bin/sh ( date; [... various commands ...] echo ) >>/root/somefile.txt
The things I've found important are the date stamp at the start, that I'm appending to the file instead of overwriting it, and the blank line at the end for some more visual separation. Appending instead of overwriting can really save things if for some reason I have to reboot twice instead of once, because it means information from the first reboot is still there.
Getting some information about the NUMA memory hierarchy of your server
If you have more than one CPU socket in a server, it almost certainly has non-uniform memory access, where some memory is 'closer' (faster to access) to some CPUs than others. You can also have NUMA even in single socket machines, depending on how things are implemented internally. This raises the question of how you can find out information about the NUMA memory hierarchy of your machines, because sometimes it matters.
The simple way of finding out how many NUMA zones you have is
lscpu, in the '
NUMA nodeN ..' section; this will
also tell you what logical CPUs are in what NUMA zones. A
typical output from a high-zone machine is:
NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15 NUMA node2 CPU(s): 16-23 NUMA node3 CPU(s): 24-31 NUMA node4 CPU(s): 32-39 NUMA node5 CPU(s): 40-47 NUMA node6 CPU(s): 48-55 NUMA node7 CPU(s): 56-63
CPU numbers need not be contiguous. Another one of our machines reports:
NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31
This generally means that you have some hyperthreading in action.
You can check this by looking at '
lscpu -e' output, which here
reports that CPU 0 and CPU 16 are on the same node, socket, and
Another way to get this information turns out to be '
This not only reports nodes and the CPUs attached to them, it also
reports the total memory attached to each node, the free memory for
each node, and the big piece of information, 'node distances', which
tell you how relatively costly it is to get to one node's memory
from another NUMA node. This comes out in a nice table form, so let
me show you:
node distances: node 0 1 2 3 4 5 6 7 0: 10 14 23 23 27 27 27 27 1: 14 10 23 23 27 27 27 27 2: 23 23 10 14 27 27 27 27 3: 23 23 14 10 27 27 27 27 4: 27 27 27 27 10 14 23 23 5: 27 27 27 27 14 10 23 23 6: 27 27 27 27 23 23 10 14 7: 27 27 27 27 23 23 14 10
And here's the same information for the server with only two NUMA zones:
node distances: node 0 1 0: 10 21 1: 21 10
The second server has a simple setup that creates a simple NUMA hierarchy; it's a two-socket server using Intel Xeon E5-2680 CPUs. The first server is eight Xeon X6550 CPUs (apparently we turned hyperthreading off on it), organized in two physically separate blocks of four CPUs. Within the same block, a CPU has one close sibling (relative cost 14) and two further away CPUs (cost 23). All cross-block access is fairly costly but uniformly so, with a relative cost of 27 for access to each NUMA node's memory.
(Note that you can have multiple NUMA zones within the same socket, and reported relative costs that aren't socket dependent. We have one server with two Opteron CPUs and four NUMA nodes, two for each socket. The reported cross-node relative cost is a uniform 20.)
The master source for this information appears to be in
file there gives essentially one row of the node distances, while
nodeN/meminfo has per-node memory usage information that's basically
a per-node version of
/proc/meminfo. There's also
which is per-node VM system statistics.
For a given process, you can see some information about which nodes
it has allocated memory on by looking at
/proc/<pid>/numa_maps. Part of the information
will be reported as '
N0=65 N1=28', which means that this process
has 65 pages from node 0 and 28 from node 1.
A massive amount of global memory state information is available
/proc/zoneinfo, and a breakdown of free page information is
/proc/buddyinfo; for more discussion of what that means, see
my entry on how the Linux kernel divides up your RAM.
/proc/pagetypeinfo for yet
more NUMA node related information.
(As far as I know, the 'node distances' are only meaningful as relative numbers and don't mean anything in absolute terms. As such I interpret the '10' that's used for a node's own memory as basically '1.0 multiplied by ten'. Presumably it's not 100 because you don't need that much precision in differences.)
A systemd mistake with a script-based service unit I recently made
That sure was a bunch of debugging because I forgot that my systemd .service file that runs scripts needed
(... or it'd apparently run the ExecStop script right after the ExecStart script, which doesn't work too well.)
Let's be specific here. This was the systemd
.service unit to
bring up my WireGuard tunnel on my work
machine, which I set up to run a 'startup' script (via
Because I had a 'stop' script sitting around, I also set the unit's
ExecStop= to point to that; the 'stop' script takes the device
down and so on.
The startup script worked when I ran it by hand, but when I set up
.service unit to start WireGuard on boot, it didn't. Specifically,
journalctl reported no errors, the WireGuard tunnel
network device and its associated routes just weren't there when
the system finished booting. At first I thought the script was
failing in a way that the systemd journal wasn't capturing, so I
stuck a bunch of debugging in (capturing all output from the script
in a file, and then running with '
set -x', and finally dumping
out various pieces of network state after the script had finished).
All of this debugging convinced me that the WireGuard tunnel was
being created during boot but then getting destroyed by the time
booting finished. I flailed around for a while theorizing that this
service or that service was destroying the WireGuard device when
it was starting (and altering my
.service to start after a steadily
increasing number of other things), but nothing fixed the issue.
Then, while I was starting at my
.service file, the penny dropped
and I actually read what was in front of my eyes:
[Service] WorkingDirectory=/var/local/wireguard ExecStart=/var/local/wireguard/startup ExecStop=/var/local/wireguard/stop Environment=LANG=C
.service file had started out life as one that I'd copied
.service file of mine. However, that
was for a daemon, where the
ExecStart= was a process that was
sticking around. I was running a script, and the script was exiting,
which meant that as far as systemd was concerned the service was
going down and it should immediately run the
ExecStop script. My
'stop' script deleted the WireGuard tunnel network device, which
explained why I found the device missing after booting had finished.
journalctl output won't tell you this; it reports only that
the service started and not mention that it's stopped again and
ExecStop script was run. If I'd looked at '
status ...' and paid attention, I'd at least have had a clue because
systemd would have told me that it thought that the service was
inactive (dead)' instead of running. If I'd had both scripts
explicitly log that they were running, I would have seen in the
logs that my 'stop' script was being executed for some reason; I
probably should add this.
This has been a pretty useful learning experience. I know, that probably sounds weird, but my view is that I'd rather make these mistakes and learn these lessons in a non-urgent, non-production situation instead of stubbing my toes on them in production and possibly under stressful conditions.
My new Linux machine for fall 2017 (planned)
My current home machine is about six years old now, and for a while I've been slowly planning a new PC. At this point my parts list is basically finalized and all that remains is the hard part, which is ordering things and perhaps assembling them. Who knows if I'll get around to doing that this year (although with the Christmas rush approaching fast, I'd better do that soon if I want to get everything before next year starts).
Because my office workstation is about as old as my home machine (and we have money), I'm probably going to try to update it to something very like this build as well.
After staring at a bunch of specifications of various things and trying to sort through reviews and commentary, this is my current parts list:
- Intel Core i7-8700
- I've decided that this time around I want
to get a relatively high end CPU.
I considered the i7-8700K, but I'm not going to overclock, the
i7-8700 has a 30 W lower TDP, and it's apparently only about .1
GHz slower in most situations, according to sources like the
frequency charts here
Also, the i7-8700's noticeably cheaper and probably more readily
I'm not considering AMD Ryzens at the moment for a number of reasons beyond the scope of this entry. The TDP for the higher end Ryzens is certainly part of it; the Ryzen 7 1700 is the first 65W TDP Ryzen, and its performance seems clearly below the i7-8700 in most respects.
- Asus PRIME Z370-A motherboard
- I know that picking a motherboard
is close to throwing darts, but Asus is my default motherboard
vendor and the Prime Z370-A has almost everything I want and very
little that I don't. Since I want onboard DisplayPort 1.2, my choice of motherboards is
more restricted than it looks, especially in these early days of
Z370-based motherboards. I'd like to get more than six SATA ports
and more than one USB-C USB 3.1 gen 2 port, but I'll take what I can get. I
can always add an expansion card later.
Because I want to be able to use the same build for my work machine, one of the additional constraints is that the motherboard has to be able to drive at least two displays at 1920x1200 @60Hz from onboard connectors. The Prime Z370-A will do this, and I consider it a feature that its specification page explicitly mentions that it supports up to 3 displays at once.
- 2x 16GB DDR4-2666MHz CL15 RAM
- Since I'm not overclocking, there's
not point in going with RAM that's clocked any faster (and it
looks like you can't get 16GB 2666 MHZ CL14 modules). With RAM
prices still depressingly high, I'll save adding yet more memory
for a hypothetical midlife upgrade.
Also, it's not like I'm going to do much with even 32 GB of RAM other
than feed it to ZFS's disk cache.
For a work build, I would like 64 GB but I can live with 32 GB. Sadly adding that extra 32 GB is quite costly, as RAM prices remain stubbornly and annoyingly high.
- A CPU cooler, probably a Cryorig H7
- I know that the i7-8700 comes
with a stock Intel CPU cooler, but I want a better one so that the
machine runs cooler. Possibly this is overkill, but then I've had
long-term CPU cooling issues at work
and I expect this machine to run for five or six years (or more) too.
- Fractal Design Define R5 case
- My case requirements are set by
wanting a not too big mid-tower case with at least two bays that can
take SSDs and four bays that can take 3.5" drives (and I'm fine if the
'SSD' bays are 3.5" bays). The Define R5 gets decent reviews. Much
like the motherboard, I'm sort of throwing darts here.
- EVGA BQ 500W power supply
- Once again I'm basically throwing darts with very little grounds for picking one option over another. 500 watts is overkill for this PC, even if I add a graphics card later, but I like having some headroom and it looks like decently rated lower wattage power supplies aren't that much cheaper. A well regarded alterative is the Corsair CX 450M, which is 50 watts less but has a five year warranty instead of a three year one.
Although it's tempting to shove an optical drive in the machine as well (and they're cheap), I'm going to try to resist the temptation. My excuse for putting an optical drive in the case would be that I wouldn't have six drives most of the time, so I'd usually have a SATA port spare for the optical drive.
I'll be moving all of my existing disks over from my current home machine (both the hard drives and the SSDs). A potential addition of or upgrade to NVME drives is another contemplated midlife upgrade.
This parts list is significantly more expensive than my 2011 machine. Without looking at detailed pricing information from 2011, my impression is that the CPU costs substantially more and the RAM costs a chunk more; it's possible that RAM prices per GB basically haven't moved since 2011 (although the RAM itself has gotten faster). Perhaps 2011 was essentially a minimum in PC costs and things have been going up since.
(To be fair, I'm almost certainly paying a premium for wanting a latest generation CPU and motherboard only a month or two after they've been introduced. And the Z370 chipset is intended to be the high-end chipset for this CPU series, with lower-end ones to be introduced later.)
Some early notes on WireGuard
WireGuard is a new(ish) secure IP tunnel system, currently only for Linux. Yesterday I wrote about why I've switched over to it; today is for some early notes on things about it that I've run into, especially in ways it's different from my previous IKE IPSec plus GRE setup.
For the most part, my WireGuard configuration is basically their
simple example configuration, but with a single peer. The important
bit I had to get my head around is the
AllowedIPs setting, which
controls which traffic is allowed to flow inside the secure tunnel.
My home machine may receive traffic to its 'inside' IP from anywhere,
so it must have an
AllowedIPs of 0.0.0.0/0. My work machine, as
my WireGuard touchdown point, should only see traffic from my home
machine and that traffic should only be coming from my home machine's
inside IP; it has an
AllowedIPs of just that IP address.
(I did specify
Endpoint on my work machine, which I think means
that my work machine, the 'server', can initiate the initial
connection handshake if necessary if it has packets to send to my
home machine and my home machine hasn't already got things going.)
Unlike IKE (and GRE), WireGuard itself has no way to restrict where traffic from a particular peer is allowed to originate; peers are authenticated (and restricted) purely by their public key, and this public key will be accepted from any IP address that can talk to you. In fact, WireGuard will happily update its idea of where a peer is if you send it appropriate traffic. If you want this sort of IP-based access restriction, you will have to add it yourself by putting both ends of the WireGuard tunnel on fixed UDP port numbers and then using iptables (or nftables) to restrict who can send IP packets to them.
(WireGuard packets are UDP, so an attacker who's managed to get a copy of your keys could forge the IP origin on traffic they send. However, an active connection requires an initial handshake to negotiate symmetric keys, so the attacker can't get anywhere just with the ability to send packets but not receive replies.)
Unlike IKE (again), WireGuard has no user-visible concept of a connection being 'up' (with encryption successfully negotiated with the remote end) or 'down'; a WireGuard network device is always up, although it may or may not pass traffic. This means that you don't have a chance to run scripts when the connection comes up or goes down, for example to establish or withdraw routes through the device. In the past I was tearing down my GRE tunnel on IPSec failure, which had security implications, but with WireGuard the tunnel and its routes stay up all the time and I'll have to manually tear it down at home if the other end breaks and I need things to still mostly work. This is more secure even if it's potentially less convenient.
(If I cared enough I could set up connection monitoring that automatically tore down the routes if the work end of the tunnel couldn't be pinged for long enough.)
WireGuard lets you set the firewall mark (fwmark) for outgoing encrypted packets, which turned out to be necessary for me for solving what I'll call the recursive VPN problem, where your remote VPN touchdown point is itself on a subnet that you want to route over the VPN. In fact my case is extra-tricky, because I want non-WireGuard IP traffic to my VPN touchdown address to flow over the WireGuard tunnel. What I did was set a fwmark in WireGuard and then used policy-based routing to force traffic with that mark to bypass the tunnel:
ip route add default dev ppp0 table 11 [...] # Force Wireguard marked packets out ppp0, no matter what. ip rule add fwmark 0x5151 iif lo priority 4999 table 11
(The fwmark value is arbitrary.)
The fwmark stuff is especially important (and useful) because the current WireGuard software is missing the ability to bind outgoing packets to a specific IP address on a multi-address host. As far as I can see, outgoing packets may someday be sent out from whatever IP address WireGuard finds convenient, instead of the IP alias that you've designated as the VPN touchdown. WireGuard on the other end will then explicitly update its idea of the peer address, even if it was initially configured with another one. I may be missing something here, and I should ask the WireGuard people about this; the might accept it as a feature request (or a bug). I'm not sure if you can fix it with policy based routing cleverness, but you might be able to.
The best way to understand WireGuard configuration files is to think
of them as interface-specific configuration files; I sort of
missed this initially. Since you apply them with '
<interface> <file>', they can only include a single interface's
parameters. Somewhat inconveniently, they include secret information
(your private key) and so must be unreadable. Similarly, it's a bit
inconvenient that checking connection status with
wg show requires
root privileges, although you can work around that with
Why I've switched from GRE-over-IPSec to using WireGuard
I have a long standing IPSec IKE and point to point GRE tunnel that gives my home machine an inside IP address at work. This has worked reasonably well for years, but recently I discovered that its bandwidth had collapsed. Some subsequent staring at network packet captures suggested that I was now seeing dropped or drastically delayed ACKs, and perhaps reordering and packet drops in general. This smelled a lot like the kind of bug that was not going to be fun to report and probably wasn't going to get fixed any time soon. I could work around it for the moment, but its presence was irritating and inconvenient, and I considered it a warning sign for IPSec plus GRE in general.
(Anything that has catastrophically bad performance that persists for some time is clearly not being used by very many other people, or if it is it's clear that the kernel developers just don't care.)
WireGuard is a new(ish) secure IP tunnel system, initially only on Linux. Its web pages talk about VPNs because that's what almost everyone uses secure tunnels for, but it's really a general secure transport for IP. I'd been hearing good things about it for a while, but I hadn't really checked it out. Yesterday I wound up reading some stuff that was both very positive on WireGuard and suggested that it was going to wind up an official part of Linux. Given my IPSec+GRE problem, this was enough to push me to actively reading its webpages, which were enough to sell me on its straightforward model of operation and convince me that I could easily implement my current tunnel setup with WireGuard. Because I'm sometimes a creature of sudden impulses, today I went ahead and switched over from my IPSec+GRE setup to a WireGuard-based one (and tweeted about it once I got the setup working).
I switched to get something that gave me my full DSL bandwidth instead of only a pathetic fraction of it, and WireGuard delivers this. It works and nothing's blown up so far. Installing WireGuard on Fedora 26 was straightforward, and configuring it was fairly easy once I read the manpage a couple of times (by that I mean 'it could be better but I've seen worse'). I definitely like how simple the the peer setup is; it's a bunch simpler (and better documented) than the IKE equivalent.
(Bear in mind that I'm a sysadmin and I'm perfectly comfortable
writing scripts and systemd
.service files, both of which I had
to do to set my WireGuard configuration up. Of course, I'd had to
do most of the same to set up IKE IPSec back when I did that.)
As a whole, my WireGuard setup is simpler and involves less magic than the IKE plus GRE one. WireGuard puts the encryption directly into the tunnel device; unlike with GRE it's not possible to have either an unencrypted tunnel or IKE IPSec but no operating tunnel. Apart from how the tunnel is created and secured, the rest of my setup is the same, which is a large part of what made it so easy to switch over.
While less magic and a simpler, easier to understand configuration is nice, I probably wouldn't have bothered to switch if my old setup had been working correctly. It was the constant drip irritation of having to be careful any time I wanted to move a big file between home and work (or even just look at a big work web page) that got to me. Well, that and the thought of what would be involved in trying to report my problem to Fedora (and probably eventually the upstream kernel). Switching to a different technology for my secure tunnel needs made the whole problem go away, which is the easy way out.
(I have some early notes on using and dealing with WireGuard, but that's going to be another entry.)