Wandering Thoughts

2017-12-12

Some notes on systemd-resolved, the systemd DNS resolver

My office workstation's upgrade to Fedora 27 resulted in a little incident with NetworkManager, which I complained about on Twitter; the resulting Twitter conversation brought systemd-resolved to my attention. My initial views weren't all that positive (because I'm biased here; systemd's recent inventions have often not been good things) but I didn't fully understand its state on my systems, so I wound up doing some digging. I'm still not too enthused, but I've wound up less grumpy than I was before and I'm not going to be forcefully blocking systemd-resolved from running at all just yet.

Systemd-resolved is systemd's DNS resolver. It has three interfaces:

  • A DBUS API that's exposed at /org/freedesktop/resolve1. I don't know how many things use this API (or at least try to use it).

  • A local caching DNS resolver at 127.0.0.53 (IPv4 only) that clients can query to specifically talk to systemd-resolved, even if you have another local caching DNS server at 127.0.0.1.

  • glibc's getaddrinfo() and friends, which would send all normal hostname lookups off to systemd-resolved. Importantly, this is sanely implemented as a NSS module. If you don't have resolve in your hosts: line in /etc/nsswitch.conf, systemd-resolved is not involved in normal hostname resolution.

All of my Fedora machines have systemd-resolved installed as part of systemd but none of them appear to have the NSS resolve module enabled, so none of them are using systemd-resolved as part of normal hostname resolution. They do appear to enable the DBus service (as far as I can sort out the chain of DBus stuff that leads to unit activation). The systemd-resolved daemon itself is not normally running, and there doesn't seem to be any systemd socket stuff that would activate it if you sent a DNS query to port 53 on 127.0.0.53, so on my Fedora machines it appears the only way it will ever start is if something makes an explicit DBus query.

However, once activated resolved has some behaviors that I don't think I'm fond of (apart from the security bugs and the regular bugs). I'm especially not enthused about its default use of LLMNR, which will normally see it broadcasting certain DNS queries out on all of my active interfaces. I consider LLMNR somewhere between useless and an active risk of various things, depending on what sort of network I'm connected to.

Resolved will make queries to DNS servers in parallel if you have more than one of them available through various paths, but here I think it's a reasonable approach to handling DNS resolution in the face of things like VPNs, which otherwise sort of requires awkward hand configuration. It's unfortunate that this behavior can harm people who know what they're doing and who want behavior like their local DNS resolver (or resolv.conf) to always override the DNS resolver settings they're getting from some random network's DHCP.

Since resolved doesn't actually shove itself in the way of anyone who didn't actively ask for it (via DBus or querying 127.0.0.53), I currently feel it's unobjectionable enough to leave unmasked and thus potentially activated via DBus. Assuming that I'm understanding and using journalctl correctly, it never seems to have been activated on either of my primary Fedora machines (and they have journalctl logs that go a long way back).

SystemdResolvedNotes written at 19:47:22; Add Comment

2017-12-06

My upgrade to Fedora 27, Secure Boot, and a mistake made somewhere

I'm usually slow about updating to new versions of Fedora; I like to let other people find the problems and then it's generally a hassle in various ways, so I keep putting it off. This week I decided that I'd been sitting on the Fedora 27 upgrade for long enough (or too long), and today it was the turn of my work laptop. It didn't entirely go well, but after the dust settled I think it's due to an innocent looking mistake I made and my specific laptop configuration.

This is a new laptop, a Dell XPS 13, and this is the first Fedora upgrade I've done on it (I installed Fedora 26 when we got it in mid-August). As I usually do, I did the Fedora 26 to 27 upgrade with the officially unsupported method of a live upgrade with dnf based on the traditional documentation for it, which I've been doing on multiple machines for many years. After I finished the upgrade process, I rebooted and the laptop failed to come up in Linux; instead it booted into the Windows 10 installation that I have on the other half of its drive. My Linux install (now with Fedora 27) was intact, but it wouldn't boot at all.

I will start with the summary. If your system boots using UEFI, you almost certainly shouldn't ever run grub2-install. Some portions of the Fedora wiki (like the Fedora page on Grub2) will tell you this pretty loudly, but the 'upgrade with package manager' page still says to use grub2-install without any qualifications, and that's what I did during my Fedora 27 upgrade.

What caused my issue is that I have Secure Boot enabled on my laptop, and at some point during the upgrade my Fedora UEFI boot entry wound up pointing to the EFI image image EFI/fedora/grubx64.efi, which isn't correctly signed and so won't boot under Secure Boot. The XPS UEFI firmware doesn't report any error message when this happens; instead it silently goes on to the next UEFI boot entry (if there is one), which in my case was Windows' standard entry. In order to boot my laptop with Secure Boot enabled, the UEFI boot entry for Fedora 27 needs to point to EFI/fedora/shimx64.efi instead of grubx64.efi. This shim loader is signed and passes the UEFI firmware's Secure Boot verification, and once it starts it hands things off to grubx64.efi for regular GRUB2 UEFI booting.

(If I disabled Secure Boot, I could use the grubx64.efi UEFI boot entry. Otherwise, only the shimx64.efi entry worked.)

At this point I don't know what my Fedora 26 UEFI boot entry looked like, but I suspect that it pointed to the Fedora 26 version of the shim (which appears to be called EFI/fedora/shim.efi). My best guess for what happened during my Fedora 27 upgrade is that when I did the grub2-install at the end, one of the things it did was run efibootmgr and reset where the 'fedora' UEFI boot entry pointed. I don't remember seeing any message reporting this, but I didn't run grub2-install with any flag to make it verbose and the code to run efibootmgr appears to be in the Grub2 source.

(And changing the UEFI boot entry is sort of reasonable. After all, I told Grub2 to install itself, and that logically includes making the UEFI boot entry point to it, just as grub2-install on a non-UEFI system will update the MBR boot record to point to itself.)

PS: I consider all of this a valuable learning experience, since I got to shoot myself in the foot and learn a bunch of things about UEFI on a machine I could live without. I'm planning to set up my future desktops as pure UEFI machines, and making this mistake on one of them would have been much more painful. For that matter, simply knowing how to set up UEFI boot entries is going to come in handy when I migrate my current disks over to the new machines.

(I'm up in the air about whether or not I'll use Secure Boot on the desktops. If they come that way, well, maybe.)

Sidebar: How I fixed this

In theory you can boot a Fedora 27 live image from a USB stick and fiddle around with efibootmgr. In practice, I went in to the laptop's UEFI 'BIOS' interface and told it to add another UEFI boot entry, because this had a reasonably simple and obvious interface. The resulting entry is a bit different from what I think efibootmgr would make, but it works (as well it should, since it was set up by the very thing that's interpreting it).

(In the course of this experience I was not pleased to discover that the Dell XPS 13's UEFI interface will let you delete UEFI boot entries with immediate effect and no confirmation or saving needed. Click the wrong button at the wrong time, and your entry is irretrievably gone on the spot.)

Fedora27SecureBootMistake written at 23:59:38; Add Comment

2017-12-03

My new Linux office workstation for fall 2017

My past two generations of office Linux desktops have been identical to my home machines, and when I wrote up my planned new home machine I expected that to be the case for my next work machine as well (we have some spare money and my work machine is six years old, so replacing it was always in the plans). It turns out that this is not going to be the case this time around; to my surprise and for reasons beyond the scope of this entry, my next office machine is going to be AMD Ryzen based.

(It turns out that I was wrong on how long it's been since I used AMD CPUs. My current desktop is Intel, but my previous 2006-era desktop was AMD based.)

The definitive parts list for this machine is as follows. Much of it is based on my planned new home machine, but obviously the switch from Intel to AMD required some other changes, some of which are irritating ones.

AMD Ryzen 1800X
Even though we're not going to overclock it, this is still the best Ryzen CPU. I figure that I can live with the 95W TDP and the cooling it requires, since that's what my current desktop has (and this time I'm getting a better CPU cooler than the stock Intel one, so it should run both cooler and quieter).

ASUS Prime X370-Pro motherboard
We recently got another Ryzen-based machine with this motherboard and it seems fine (as a CPU/GPU compute server). The motherboard has a decent assortment of SATA ports, USB, and so on, and really there's not much to say about it. I also looked at the slightly less expensive X370-A, but the X370-Pro has more than enough improvements to strongly prefer it (including two more SATA ports and onboard Intel-based networking instead of Realtek-based).

It does come with built in colourful LED lighting, which looks a bit odd in the machine in our server room. I'll live with it.

(This motherboard is mostly an improvement on the Intel version since it has more SATA ports, although I believe it has one less M.2 NVME port. But with two x16 PCIE slots, you can fix that with an add-on card.)

2x16 GB DDR4-2400 Kingston ECC ValueRAM
Two DIMMs is what you want on Ryzens today. We're using ECC RAM basically because we can; it's available and is only a bit more expensive than non-ECC RAM, runs fast enough, and is supported to at least some degree by the motherboard. We don't know if it will correct any errors, but probably it will.

(You can't get single-rank 16GB DIMMs, so that this ECC RAM is double-rank is not a drawback.)

The RAM speed issues with Ryzen is one of the irritations of building this machine around an AMD CPU instead of an Intel one. It may never be upgraded to 64 GB RAM over its lifetime (which will probably be at least five years).

Noctua NH-U12-SE-AM4 CPU cooler
We need some cooler for the Ryzen 1800X (since it doesn't come with one). These are well reviewed as both effective and quiet, and the first Ryzen machine we got has a Noctua cooler as well (although a different one).

Gigabyte Radeon RX 550 2GB video card
That I need a graphics card is one of the irritations of Ryzens. Needing a discrete graphics card means an AMD/ATI card right now, and I wanted one with a reasonably modern graphics architecture (and I needed one with at least two digital video outputs, since I have dual monitors). I sort of threw darts here, but reviewers seem to say that this card should be quiet under normal use.

As a Linux user I don't normally stress my graphics, but I expect to have to run Wayland by the end of the lifetime of this machine and I suspect that it will want something better than a vintage 2011 chipset. A modern Intel integrated GPU would likely have been fine, but Ryzens don't have integrated graphics so I have to go with a separate card.

(The Prime X370-Pro has onboard HDMI and DisplayPort connectors, but a footnote in the specifications notes that they only do anything if you have an Athlon CPU with integrated graphics. This disappointed me when I read it carefully, because at first I thought I was going to get to skip a separate video card.)

EVGA SuperNOVA G3 550W PSU
Commentary on my planned home machine pushed me to a better PSU than I initially put in that machine's parts list. Going to 550W buys me some margin for increased power needs for things like a more powerful GPU, if I ever need it.

(There are vaguely plausible reasons I might want to temporarily put in a GPU capable of running things like CUDA or Tensorflow. Some day we may need to know more about them than we currently do, since our researchers are increasingly interested in GPU computing.)

Fractal Design Define R5 case
All of the reasons I originally had for my home machine apply just as much for my work machine. I'm actively looking forward to having enough drive bays (and SATA ports) to temporarily throw hard drives into my case for testing purposes.

LG GH24NSC0 DVD/CD Writer
This is an indulgence, but it's an inexpensive one, I do actually burn DVDs at work every so often, and the motherboard has 8 SATA ports so I can actually connect this up all the time.

Unlike my still-theoretical new home machine (which is now unlikely to materialize before the start of next year at the earliest), the parts for my new office machine have all been ordered, so this is final. We're going to assemble it ourselves (by which I mean that I'm going to, possibly with some assistance from my co-workers if I run into problems).

On the bright side of not doing anything about a new home machine, now I'm going to get experience with a bunch of the parts I was planning to use in it (and with assembling a modern PC). If I decide I dislike the case or whatever for some reason, well, now I can look for another one.

(However, there's not much chance that I'll change my mind on using an Intel CPU in my new home machine even if this AMD-based one goes well. The 1800X is a more expensive CPU, although not as much so as I was expecting, and then there's the need for a GPU and the whole issues with memory and so on. Plus I remain more interested in single-thread CPU performance in my home usage. Still, I could wind up surprising myself here, especially if ECC turns out to be genuinely useful. Genuinely useful ECC would be a bit disturbing, of course, since that implies that I'd be seeing single-bit RAM errors far more than I think I should be.)

WorkMachine2017 written at 01:13:27; Add Comment

2017-11-30

We're broadly switching to synchronizing time with systemd's timesyncd

Every so often, simply writing an entry causes me to take a closer look at something I hadn't paid much attention to before. I recently wrote a series of entries on my switch from ntpd to chrony on my desktops and why we don't run NTP daemons but instead synchronize time through a cron entry. Our hourly crontab script for time synchronization dates back to at least 2008 and perhaps as early as 2006 and our first Ubuntu 6.06 installs; we've been carrying it forward ever since without thinking about it very much. In particular, we carried it forward into our standard 16.04 installs. When we did this, we didn't really pay attention to the fact that 16.04 is different here, because 16.04 is systemd based and includes systemd's timesyncd time synchronization system. Ubuntu installed and activated systemd-timesyncd (with a stock setup that got time from ntp.ubuntu.com), we installed our hourly crontab script, and nothing exploded so we didn't really pay attention to any of this.

When I wrote my entries, they caused me to start actually noticing systemd-timesyncd and paying some attention to it, which included noticing that it was actually running and synchronizing the time on our servers (which kind of invalidates my casual claim here that our servers were typically less than a millisecond out in an hour, since that was based on ntpdate's reports and I was assuming that there was no other time synchronization going on). Coincidentally, one of my co-workers had also had timesyncd come to his attention recently for reasons outside of the scope of this entry. With timesyncd temporarily in our awareness, my co-workers and I talked over the whole issue and decided that doing time synchronization the official 16.04 systemd way made the most sense.

(Part of it is that we're likely to run into this issue on all future Linuxes we deal with, because systemd is everywhere. CentOS 7 appears to be just a bit too old to have timesyncd, but a future CentOS 8 very likely will, and of course Ubuntu 18.04 will and so on. We could fight city hall, but at a certain point it's less effort to go with the flow.)

In other words, we're switching over to officially using systemd-timesyncd. We were passively using it before without really realizing it since we didn't disable timesyncd, but now we're actively configuring it to use our time local servers instead of Ubuntu's and we're disabling and removing our hourly cron job. I guess we're now running NTP daemons on all our servers after all; not because we need them for any of the reasons I listed, but just because it's the easiest way.

(At the moment we're also using /etc/default/ntpdate (from the Ubuntu ntpdate package) to force an initial synchronization at boot time, or technically when the interface comes up. We'll probably keep doing this unless timesyncd picks up good explicit support for initially force-setting the system time; when our machines boot and get on the network, we want them to immediately jump their time to whatever we currently think it is.)

SwitchingToTimesyncd written at 21:37:12; Add Comment

2017-11-26

One way of capturing debugging state information in a systemd-based system

Suppose, not entirely hypothetically, that you have a systemd .service unit running something where the something (whatever it is) is mysteriously failing to start or run properly. In the most frustrating version of this, you can run the operation just fine after the system finishes booting and you can log in, but it fails during boot and you can't see why. In this situation you often want to gather information about the boot-time state of the system just before your daemon or program is started and fails; you might need to know things like what devices are available, the state of network interfaces and routes, what filesystems have been mounted, what other things are already running, and so on.

All of this information can be gathered by a shell script, but the slightly tricky bit is figuring out how to get it to run. I've taken two approaches here. The first one is to simply write a new .service file:

[Unit]
Description=Debug stuff
After=<whatever>
Before=<whatever else>

[Service]
Type=oneshot
RemainAfterExit=True
ExecStart=/root/gather-info

[Install]
WantedBy=multi-user.target

Here the actual information gathering script is /root/gather-info. I typically have it write its data into a file in /root as well. I use /root as a handy dumping ground that's on the root filesystem but not conceptually owned by the package manager in the way that /etc, /bin, and so on are; I can throw things in there without worrying that I'm causing (much) future problems.

(If you use an ExecStop= instead of ExecStart= you can gather the same sort of information at shutdown.)

However, if you're interested in the state basically right before some other .service runs, the better approach is to modify that .service to add an extra ExecStartPre= line. In order to make sure I know what's going on, my approach is to copy the entire .service file to /etc/systemd/system (if necessary) and then edit it. As an example, suppose that your ZFS on Linux setup is failing to import pools on boot because the zfs-import-cache.service unit is failing.

Here I'd modify the .service like this:

[...]

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=/root/gather-info
ExecStartPre=/sbin/modprobe zfs
ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN

[...]

Unfortunately I don't think you can do this without copying the whole .service file, or at least I wouldn't want to trust it any other way.

Possibly there's a better way to do this in the systemd world, but I've been sort of frustrated by how difficult it is to do various things here. For example, it would be nice if systemd would easily give you the names of systemd units that ran or failed, instead of their Description= texts. More than once I've had to resort to 'grep -rl <whatever> /usr/lib/systemd/system' in an attempt to find a unit file so I could see what it actually did.

Sidebar: My usual general format for information-gathering scripts

I tend to write them like this:

#!/bin/sh
( date;
  [... various commands ...]
  echo
) >>/root/somefile.txt

The things I've found important are the date stamp at the start, that I'm appending to the file instead of overwriting it, and the blank line at the end for some more visual separation. Appending instead of overwriting can really save things if for some reason I have to reboot twice instead of once, because it means information from the first reboot is still there.

SystemdCapturingBootState written at 02:09:14; Add Comment

2017-11-19

Getting some information about the NUMA memory hierarchy of your server

If you have more than one CPU socket in a server, it almost certainly has non-uniform memory access, where some memory is 'closer' (faster to access) to some CPUs than others. You can also have NUMA even in single socket machines, depending on how things are implemented internally. This raises the question of how you can find out information about the NUMA memory hierarchy of your machines, because sometimes it matters.

The simple way of finding out how many NUMA zones you have is probably lscpu, in the 'NUMA nodeN ..' section; this will also tell you what logical CPUs are in what NUMA zones. A typical output from a high-zone machine is:

NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15
NUMA node2 CPU(s):     16-23
NUMA node3 CPU(s):     24-31
NUMA node4 CPU(s):     32-39
NUMA node5 CPU(s):     40-47
NUMA node6 CPU(s):     48-55
NUMA node7 CPU(s):     56-63

CPU numbers need not be contiguous. Another one of our machines reports:

NUMA node0 CPU(s):     0-7,16-23
NUMA node1 CPU(s):     8-15,24-31

This generally means that you have some hyperthreading in action. You can check this by looking at 'lscpu -e' output, which here reports that CPU 0 and CPU 16 are on the same node, socket, and core.

Another way to get this information turns out to be 'numactl -H'. This not only reports nodes and the CPUs attached to them, it also reports the total memory attached to each node, the free memory for each node, and the big piece of information, 'node distances', which tell you how relatively costly it is to get to one node's memory from another NUMA node. This comes out in a nice table form, so let me show you:

node distances:
node   0   1   2   3   4   5   6   7 
  0:  10  14  23  23  27  27  27  27 
  1:  14  10  23  23  27  27  27  27 
  2:  23  23  10  14  27  27  27  27 
  3:  23  23  14  10  27  27  27  27 
  4:  27  27  27  27  10  14  23  23 
  5:  27  27  27  27  14  10  23  23 
  6:  27  27  27  27  23  23  10  14 
  7:  27  27  27  27  23  23  14  10 

And here's the same information for the server with only two NUMA zones:

node distances:
node   0   1 
  0:  10  21 
  1:  21  10 

The second server has a simple setup that creates a simple NUMA hierarchy; it's a two-socket server using Intel Xeon E5-2680 CPUs. The first server is eight Xeon X6550 CPUs (apparently we turned hyperthreading off on it), organized in two physically separate blocks of four CPUs. Within the same block, a CPU has one close sibling (relative cost 14) and two further away CPUs (cost 23). All cross-block access is fairly costly but uniformly so, with a relative cost of 27 for access to each NUMA node's memory.

(Note that you can have multiple NUMA zones within the same socket, and reported relative costs that aren't socket dependent. We have one server with two Opteron CPUs and four NUMA nodes, two for each socket. The reported cross-node relative cost is a uniform 20.)

The master source for this information appears to be in /sys, specifically under /sys/devices/system/node. The nodeN/distance file there gives essentially one row of the node distances, while nodeN/meminfo has per-node memory usage information that's basically a per-node version of /proc/meminfo. There's also nodeN/vmstat, which is per-node VM system statistics.

For a given process, you can see some information about which nodes it has allocated memory on by looking at /proc/<pid>/numa_maps. Part of the information will be reported as 'N0=65 N1=28', which means that this process has 65 pages from node 0 and 28 from node 1.

A massive amount of global memory state information is available in /proc/zoneinfo, and a breakdown of free page information is in /proc/buddyinfo; for more discussion of what that means, see my entry on how the Linux kernel divides up your RAM. There's also /proc/pagetypeinfo for yet more NUMA node related information.

(As far as I know, the 'node distances' are only meaningful as relative numbers and don't mean anything in absolute terms. As such I interpret the '10' that's used for a node's own memory as basically '1.0 multiplied by ten'. Presumably it's not 100 because you don't need that much precision in differences.)

NUMAMemoryInfo written at 02:21:09; Add Comment

2017-11-10

A systemd mistake with a script-based service unit I recently made

I tweeted:

That sure was a bunch of debugging because I forgot that my systemd .service file that runs scripts needed

Type=oneshot
RemainAfterExit=True

(... or it'd apparently run the ExecStop script right after the ExecStart script, which doesn't work too well.)

Let's be specific here. This was the systemd .service unit to bring up my WireGuard tunnel on my work machine, which I set up to run a 'startup' script (via ExecStart=). Because I had a 'stop' script sitting around, I also set the unit's ExecStop= to point to that; the 'stop' script takes the device down and so on.

The startup script worked when I ran it by hand, but when I set up the .service unit to start WireGuard on boot, it didn't. Specifically, although journalctl reported no errors, the WireGuard tunnel network device and its associated routes just weren't there when the system finished booting. At first I thought the script was failing in a way that the systemd journal wasn't capturing, so I stuck a bunch of debugging in (capturing all output from the script in a file, and then running with 'set -x', and finally dumping out various pieces of network state after the script had finished).

All of this debugging convinced me that the WireGuard tunnel was being created during boot but then getting destroyed by the time booting finished. I flailed around for a while theorizing that this service or that service was destroying the WireGuard device when it was starting (and altering my .service to start after a steadily increasing number of other things), but nothing fixed the issue. Then, while I was starting at my .service file, the penny dropped and I actually read what was in front of my eyes:

[Service]
WorkingDirectory=/var/local/wireguard
ExecStart=/var/local/wireguard/startup
ExecStop=/var/local/wireguard/stop
Environment=LANG=C

This .service file had started out life as one that I'd copied from another .service file of mine. However, that .service file was for a daemon, where the ExecStart= was a process that was sticking around. I was running a script, and the script was exiting, which meant that as far as systemd was concerned the service was going down and it should immediately run the ExecStop script. My 'stop' script deleted the WireGuard tunnel network device, which explained why I found the device missing after booting had finished.

The journalctl output won't tell you this; it reports only that the service started and not mention that it's stopped again and that the ExecStop script was run. If I'd looked at 'systemctl status ...' and paid attention, I'd at least have had a clue because systemd would have told me that it thought that the service was 'inactive (dead)' instead of running. If I'd had both scripts explicitly log that they were running, I would have seen in the logs that my 'stop' script was being executed for some reason; I probably should add this.

This has been a pretty useful learning experience. I know, that probably sounds weird, but my view is that I'd rather make these mistakes and learn these lessons in a non-urgent, non-production situation instead of stubbing my toes on them in production and possibly under stressful conditions.

SystemdScriptServiceFumble written at 01:44:04; Add Comment

2017-11-07

My new Linux machine for fall 2017 (planned)

My current home machine is about six years old now, and for a while I've been slowly planning a new PC. At this point my parts list is basically finalized and all that remains is the hard part, which is ordering things and perhaps assembling them. Who knows if I'll get around to doing that this year (although with the Christmas rush approaching fast, I'd better do that soon if I want to get everything before next year starts).

Because my office workstation is about as old as my home machine (and we have money), I'm probably going to try to update it to something very like this build as well.

After staring at a bunch of specifications of various things and trying to sort through reviews and commentary, this is my current parts list:

Intel Core i7-8700
I've decided that this time around I want to get a relatively high end CPU. I considered the i7-8700K, but I'm not going to overclock, the i7-8700 has a 30 W lower TDP, and it's apparently only about .1 GHz slower in most situations, according to sources like the frequency charts here versus here. Also, the i7-8700's noticeably cheaper and probably more readily available.

I'm not considering AMD Ryzens at the moment for a number of reasons beyond the scope of this entry. The TDP for the higher end Ryzens is certainly part of it; the Ryzen 7 1700 is the first 65W TDP Ryzen, and its performance seems clearly below the i7-8700 in most respects.

Asus PRIME Z370-A motherboard
I know that picking a motherboard is close to throwing darts, but Asus is my default motherboard vendor and the Prime Z370-A has almost everything I want and very little that I don't. Since I want onboard DisplayPort 1.2, my choice of motherboards is more restricted than it looks, especially in these early days of Z370-based motherboards. I'd like to get more than six SATA ports and more than one USB-C USB 3.1 gen 2 port, but I'll take what I can get. I can always add an expansion card later.

Because I want to be able to use the same build for my work machine, one of the additional constraints is that the motherboard has to be able to drive at least two displays at 1920x1200 @60Hz from onboard connectors. The Prime Z370-A will do this, and I consider it a feature that its specification page explicitly mentions that it supports up to 3 displays at once.

2x 16GB DDR4-2666MHz CL15 RAM
Since I'm not overclocking, there's not point in going with RAM that's clocked any faster (and it looks like you can't get 16GB 2666 MHZ CL14 modules). With RAM prices still depressingly high, I'll save adding yet more memory for a hypothetical midlife upgrade. Also, it's not like I'm going to do much with even 32 GB of RAM other than feed it to ZFS's disk cache.

For a work build, I would like 64 GB but I can live with 32 GB. Sadly adding that extra 32 GB is quite costly, as RAM prices remain stubbornly and annoyingly high.

A CPU cooler, probably a Cryorig H7
I know that the i7-8700 comes with a stock Intel CPU cooler, but I want a better one so that the machine runs cooler. Possibly this is overkill, but then I've had long-term CPU cooling issues at work and I expect this machine to run for five or six years (or more) too.

Fractal Design Define R5 case
My case requirements are set by wanting a not too big mid-tower case with at least two bays that can take SSDs and four bays that can take 3.5" drives (and I'm fine if the 'SSD' bays are 3.5" bays). The Define R5 gets decent reviews. Much like the motherboard, I'm sort of throwing darts here.

EVGA BQ 500W power supply
Once again I'm basically throwing darts with very little grounds for picking one option over another. 500 watts is overkill for this PC, even if I add a graphics card later, but I like having some headroom and it looks like decently rated lower wattage power supplies aren't that much cheaper. A well regarded alterative is the Corsair CX 450M, which is 50 watts less but has a five year warranty instead of a three year one.

Although it's tempting to shove an optical drive in the machine as well (and they're cheap), I'm going to try to resist the temptation. My excuse for putting an optical drive in the case would be that I wouldn't have six drives most of the time, so I'd usually have a SATA port spare for the optical drive.

I'll be moving all of my existing disks over from my current home machine (both the hard drives and the SSDs). A potential addition of or upgrade to NVME drives is another contemplated midlife upgrade.

This parts list is significantly more expensive than my 2011 machine. Without looking at detailed pricing information from 2011, my impression is that the CPU costs substantially more and the RAM costs a chunk more; it's possible that RAM prices per GB basically haven't moved since 2011 (although the RAM itself has gotten faster). Perhaps 2011 was essentially a minimum in PC costs and things have been going up since.

(To be fair, I'm almost certainly paying a premium for wanting a latest generation CPU and motherboard only a month or two after they've been introduced. And the Z370 chipset is intended to be the high-end chipset for this CPU series, with lower-end ones to be introduced later.)

HomeMachine2017 written at 01:07:04; Add Comment

2017-11-05

Some early notes on WireGuard

WireGuard is a new(ish) secure IP tunnel system, currently only for Linux. Yesterday I wrote about why I've switched over to it; today is for some early notes on things about it that I've run into, especially in ways it's different from my previous IKE IPSec plus GRE setup.

For the most part, my WireGuard configuration is basically their simple example configuration, but with a single peer. The important bit I had to get my head around is the AllowedIPs setting, which controls which traffic is allowed to flow inside the secure tunnel. My home machine may receive traffic to its 'inside' IP from anywhere, so it must have an AllowedIPs of 0.0.0.0/0. My work machine, as my WireGuard touchdown point, should only see traffic from my home machine and that traffic should only be coming from my home machine's inside IP; it has an AllowedIPs of just that IP address.

(I did specify Endpoint on my work machine, which I think means that my work machine, the 'server', can initiate the initial connection handshake if necessary if it has packets to send to my home machine and my home machine hasn't already got things going.)

Unlike IKE (and GRE), WireGuard itself has no way to restrict where traffic from a particular peer is allowed to originate; peers are authenticated (and restricted) purely by their public key, and this public key will be accepted from any IP address that can talk to you. In fact, WireGuard will happily update its idea of where a peer is if you send it appropriate traffic. If you want this sort of IP-based access restriction, you will have to add it yourself by putting both ends of the WireGuard tunnel on fixed UDP port numbers and then using iptables (or nftables) to restrict who can send IP packets to them.

(WireGuard packets are UDP, so an attacker who's managed to get a copy of your keys could forge the IP origin on traffic they send. However, an active connection requires an initial handshake to negotiate symmetric keys, so the attacker can't get anywhere just with the ability to send packets but not receive replies.)

Unlike IKE (again), WireGuard has no user-visible concept of a connection being 'up' (with encryption successfully negotiated with the remote end) or 'down'; a WireGuard network device is always up, although it may or may not pass traffic. This means that you don't have a chance to run scripts when the connection comes up or goes down, for example to establish or withdraw routes through the device. In the past I was tearing down my GRE tunnel on IPSec failure, which had security implications, but with WireGuard the tunnel and its routes stay up all the time and I'll have to manually tear it down at home if the other end breaks and I need things to still mostly work. This is more secure even if it's potentially less convenient.

(If I cared enough I could set up connection monitoring that automatically tore down the routes if the work end of the tunnel couldn't be pinged for long enough.)

WireGuard lets you set the firewall mark (fwmark) for outgoing encrypted packets, which turned out to be necessary for me for solving what I'll call the recursive VPN problem, where your remote VPN touchdown point is itself on a subnet that you want to route over the VPN. In fact my case is extra-tricky, because I want non-WireGuard IP traffic to my VPN touchdown address to flow over the WireGuard tunnel. What I did was set a fwmark in WireGuard and then used policy-based routing to force traffic with that mark to bypass the tunnel:

ip route add default dev ppp0 table 11
[...]

# Force Wireguard marked packets out ppp0, no matter what.
ip rule add fwmark 0x5151 iif lo priority 4999 table 11

(The fwmark value is arbitrary.)

This is much less magic than the IPSec equivalent, and as a result I have more confidence that it won't suffer from occasional bugs.

The fwmark stuff is especially important (and useful) because the current WireGuard software is missing the ability to bind outgoing packets to a specific IP address on a multi-address host. As far as I can see, outgoing packets may someday be sent out from whatever IP address WireGuard finds convenient, instead of the IP alias that you've designated as the VPN touchdown. WireGuard on the other end will then explicitly update its idea of the peer address, even if it was initially configured with another one. I may be missing something here, and I should ask the WireGuard people about this; the might accept it as a feature request (or a bug). I'm not sure if you can fix it with policy based routing cleverness, but you might be able to.

The best way to understand WireGuard configuration files is to think of them as interface-specific configuration files; I sort of missed this initially. Since you apply them with 'wg setconf <interface> <file>', they can only include a single interface's parameters. Somewhat inconveniently, they include secret information (your private key) and so must be unreadable. Similarly, it's a bit inconvenient that checking connection status with wg show requires root privileges, although you can work around that with sudo.

WireGuardEarlyNotes written at 02:24:37; Add Comment

2017-11-04

Why I've switched from GRE-over-IPSec to using WireGuard

I have a long standing IPSec IKE and point to point GRE tunnel that gives my home machine an inside IP address at work. This has worked reasonably well for years, but recently I discovered that its bandwidth had collapsed. Some subsequent staring at network packet captures suggested that I was now seeing dropped or drastically delayed ACKs, and perhaps reordering and packet drops in general. This smelled a lot like the kind of bug that was not going to be fun to report and probably wasn't going to get fixed any time soon. I could work around it for the moment, but its presence was irritating and inconvenient, and I considered it a warning sign for IPSec plus GRE in general.

(Anything that has catastrophically bad performance that persists for some time is clearly not being used by very many other people, or if it is it's clear that the kernel developers just don't care.)

WireGuard is a new(ish) secure IP tunnel system, initially only on Linux. Its web pages talk about VPNs because that's what almost everyone uses secure tunnels for, but it's really a general secure transport for IP. I'd been hearing good things about it for a while, but I hadn't really checked it out. Yesterday I wound up reading some stuff that was both very positive on WireGuard and suggested that it was going to wind up an official part of Linux. Given my IPSec+GRE problem, this was enough to push me to actively reading its webpages, which were enough to sell me on its straightforward model of operation and convince me that I could easily implement my current tunnel setup with WireGuard. Because I'm sometimes a creature of sudden impulses, today I went ahead and switched over from my IPSec+GRE setup to a WireGuard-based one (and tweeted about it once I got the setup working).

I switched to get something that gave me my full DSL bandwidth instead of only a pathetic fraction of it, and WireGuard delivers this. It works and nothing's blown up so far. Installing WireGuard on Fedora 26 was straightforward, and configuring it was fairly easy once I read the manpage a couple of times (by that I mean 'it could be better but I've seen worse'). I definitely like how simple the the peer setup is; it's a bunch simpler (and better documented) than the IKE equivalent.

(Bear in mind that I'm a sysadmin and I'm perfectly comfortable writing scripts and systemd .service files, both of which I had to do to set my WireGuard configuration up. Of course, I'd had to do most of the same to set up IKE IPSec back when I did that.)

As a whole, my WireGuard setup is simpler and involves less magic than the IKE plus GRE one. WireGuard puts the encryption directly into the tunnel device; unlike with GRE it's not possible to have either an unencrypted tunnel or IKE IPSec but no operating tunnel. Apart from how the tunnel is created and secured, the rest of my setup is the same, which is a large part of what made it so easy to switch over.

While less magic and a simpler, easier to understand configuration is nice, I probably wouldn't have bothered to switch if my old setup had been working correctly. It was the constant drip irritation of having to be careful any time I wanted to move a big file between home and work (or even just look at a big work web page) that got to me. Well, that and the thought of what would be involved in trying to report my problem to Fedora (and probably eventually the upstream kernel). Switching to a different technology for my secure tunnel needs made the whole problem go away, which is the easy way out.

(I have some early notes on using and dealing with WireGuard, but that's going to be another entry.)

WireGuardWhyISwitched written at 02:01:37; Add Comment

(Previous 10 or go back to October 2017 at 2017/10/20)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.