Stopping udev from renaming your VLAN interfaces to bad names
Back in early December I wrote about Why udev may be trying to rename your VLAN interfaces to bad names, where modern versions of udev tried to rename VLAN devices from the arbitrary names you give them to the base name of the network device they're on. Since the base name is already taken, this fails.
There turns out to be a simple cause and workaround for this, at
least in my configuration, from Zbigniew Jędrzejewski-Szmek. In Fedora,
all I need to do is add 'NamePolicy=keep'
to the [Link] section of my
.link file. This makes my
.link file be:
[Match] MACAddress=60:45:cb:a0:e8:dd [Link] Description=Onboard port MACAddressPolicy=persistent Name=em0 # Stop VLAN renaming NamePolicy=keep
Setting 'NamePolicy=keep' doesn't keep the actual network device from being renamed from the kernel's original name for it to 'em0', but it makes udev leave the VLAN devices alone. In turn this means udev and systemd consider them to have been successfully created, so you get the usual systemd sys-subsystem-net-devices .devices units for them showing up as fully up.
In a way, 'NamePolicy=keep' in a .link file is an indirect way for me to tell apart real network hardware from created virtual devices that share the same MAC, or at least ones created through networkd. As covered in the systemd.netdev manpage, giving a name to your virtual device is mandatory (Name= is a required field), so I think such devices will always be considered to already have a name by udev.
(This was a change in systemd-241, apparently. It changes the semantics of existing .link files in a way that's subtly not backward compatible, but such is the systemd way.)
However, I suspect that things might be different if I didn't use
biosdevname=0' in my kernel command line parameters. These days
this is implemented in udev, so allowing udev to rename your network
devices from the kernel assigned names to the consistent network
scheme may be considered a rename for the purposes of 'NamePolicy=keep'.
That would leave me with the same problem of telling real hardware
apart from virtual hardware that I had in the original entry.
However, for actual matching against physical hardware, I suspect that you can also generally use a Property= on selected attributes (as suggested by Alex Xu in the comments on the original entry). For instance, most people's network devices are on PCI busses, so:
There are a whole variety of properties that real network hardware
has that VLANs don't (based on '
udev info' output), although I
don't know about other types of virtual network devices. It does
seem pretty safe that no virtual network device will claim to be
on a PCI bus, though.
(I haven't tested the Property= approach, since 'NamePolicy=keep' is sufficient in my case.)
Fedora 31 has decided to allow (and have) giant process IDs (PIDs)
Every new process and thread on Linux gets a new PID (short for process ID). PIDs are normally assigned sequentially until they hit some maximum value and roll over. The traditional maximum PID value on Unixes has been some number related to a 16-bit integer, either signed or unsigned, and Linux is no exception; the kernel default is generally still 32768 (which is 2^15 exactly, and so not quite authentic to a signed 16-bit int).
(You can find the current limit in /proc/sys/kernel/pid_max, but it may have been increased through sysctls.)
A few years ago I discovered that a Fedora package had raised
this limit on me, which I was able to see
because it turned out that my Fedora machines routinely go through
a lot of PIDs. I reverted this by removing the
package for various reasons, including that I don't really like
gigantic process IDs (they bulk up the output of
other similar tools). Then recently I updated to Fedora 31, and not
too long afterward noticed that I was getting giant process IDs again
(as I write this a new shell on one machine gets PID 4,085,915).
This turns out to be a deliberate choice in modern versions of systemd, instead of another stray package deciding it knows best. In Fedora 31 (with systemd 243), /usr/lib/sysctl.d/50-pid-max.conf says:
# Bump the numeric PID range to its maximum of 2^22 # (from the in-kernel default of 2^16), to make PID # collisions less likely. kernel.pid_max = 4194304
(Since the PID that new processes get is so close to the maximum, I suspect that I have actually rolled over even this large range a couple of times in the 21 days that this machine has been up since the last time I got around to a kernel update.)
Given that this is a new official systemd thing, I'm going to let it be and live with gigantic PIDs. It's not really worth fighting systemd; it generally doesn't end well for me.
(Hopefully there aren't any programs on the system that assume PIDs are small and always fit into five-character fields in ASCII. Or at least no programs that will fail when this assumption is incorrect, as opposed to producing ugly output.)
eBPF based tools are still a work in progress on common Linuxes
These days, it seems that a lot of people are talking about and praising eBPF and tools like BCC and bpftrace, and you can read about places such as Facebook and Netflix routinely using dozens of eBPF programs on systems in production. All of these are true, by and large; if you have the expertise and in the right environment, eBPF can do great things. Unfortunately, though, a number of things get in the way of more ordinary sysadmins being able to use eBPF and these eBPF tools for powerful things.
The first problem is that these tools (and especially good versions of these tools) are fairly recent, which means that they're not necessarily packaged in your Linux distribution. For instance, Ubuntu 18.04 doesn't package bpftrace or anything other than a pretty old version of BCC. You can add third party repositories to your Ubuntu system to (try to) fix this, but that comes with various sorts of maintenance problems and anyway a fair number of nice eBPF features also require somewhat modern kernels. Ubuntu LTS's standard server kernel doesn't necessarily qualify. The practical result is that eBPF is off the table for us until 20.04 or later, unless we have a serious enough problem that we get desperate.
(Certainly we're very unlikely to try to use eBPF on 18.04 for the kinds of routine monitoring and so on that Facebook, Netflix, and so on use it for.)
Even on distributions with recent packages, such as Fedora, you can run into issues where people working in the eBPF world assume you're in a very current environment. The Cloudflare ebpf_exporter (also) is a great way to get things like local disk latency histograms into Prometheus, but the current code base assumes you're using a version of BCC that was released only in October. That's a bit recent, even for Fedora.
(The ebpf_exporter does have pre-built release binaries available, so that's something.)
Then there's the fact that sometimes all of this is held together with unreliable glue because it's not really designed to all work together. Fedora has just updated the Fedora 31 to be a 5.4.x kernel, and now all BCC programs (including examples) fail to compile with a stream of reports about "error: expected '(' after 'asm'" being reported for various bits of the 5.4 kernel headers. Based on some Internet reading, this is apparently a sign of clang attempting to interpret inline assembly things that were written for gcc (which is what the Linux kernel is compiled with). Probably this will get fixed at some point, but for now Fedora people get to choose either 5.4 or BCC but not both.
(bpftrace still works on the Fedora 5.4 kernel, at least in light testing.)
Finally, there's the general problem (shared with DTrace on Solaris and Illumos) that a fair number of the things you might be interested in require hooking directly into the kernel code and the Linux kernel code famously can change all the time. My impression is that eBPF is slowly getting more stable tracepoints over time, but also that a lot of the time you're still directly attaching eBPF hooks to kernel functions.
In time, all of this will settle down. Both eBPF and the eBPF tools will stabilize, current enough versions of everything will be in all common Linux distributions, even the long term support versions, and the kernel will have stable tracepoints and so on that cover most of what you need. But that's not really the state of things today, and it probably won't be for at least a few years to come (and don't even ask about Red Hat Enterprise 7 and 8, which will be around for years to come in some places).
(This more or less elaborates on a tweet of mine.)
Why udev may be trying to rename your VLAN interfaces to bad names
When I updated my office workstation to Fedora 30 back in August, I ran into a little issue:
It has been '0' days since systemd/udev blew up my networking. Fedora 30 systemd/udev attempts to rename VLAN devices to the interface's base name and fails spectacularly, causing the sys-subsystem*.device units to not be present. We hope you didn't depend on them! (I did.)
I filed this as Fedora bug #1741678, and just today I got a clue so that now I think I know why this happens.
The symptom of this problem is that during boot, your system will log things like:
systemd-udevd: em-net5: Failed to rename network interface 4 from 'em-net5' to 'em0': Device or resource busy
As you might guess from the name I've given it here, em-net5 is a VLAN on em0. The name 'em0' itself is one that I assigned, because I don't like the network names that systemd-udevd would assign if left on its own (they are what I would call ugly, or at least tangled and long). The failure here prevents systemd from creating the sys-subsystem-net-devices-em-net5.device unit that it normally would (and then this had further consequences because of systemd's lack of good support for networks being ready).
[Match] MACAddress=60:45:cb:a0:e8:dd [Link] Description=Onboard port MACAddressPolicy=persistent Name=em0
Based on what 'udevadm test' reports, it appears that when udevd is configuring the em-net5 VLAN, it (still) matches this .link file for the underlying device and applying things from it. My guess is that this is happening because VLANs and their underlying physical interfaces normally share MACs, and so the VLAN MAC matches the MAC here.
This appears to be a behavior change in the version of udev shipped
in Fedora 30. Before Fedora 30, systemd-udevd and networkd did not
match VLAN MACs against .link files; from Fedora 30 onward, it
appears to do so. To stop this, presumably one needs to limit your
.link files to only matching on physical interfaces, not VLANs, but
unfortunately this seems difficult to do. The systemd.link manpage
documents a '
Type=' match, but while VLANs have a type that can
be used for this, native interfaces do not appear to (and there
doesn't seem to be a way to negate the match). There are various
hacks that could be committed here, but all of them are somewhat
unpleasant to me (such as specifying the kernel driver; if the
kernel's opinion of what driver to use for this hardware changes,
I am up a creek again).
My new Linux office workstation disk partitioning for the end of 2019
I've just had the rare opportunity to replace all of my office machine's disks at once, without having to carry over any of the previous generation the way I've usually had to. As part of replacing everything I got the chance to redo the partitioning and setup of all of my disks, again all at once without the need to integrate a mix of the future and the past. For various reasons, I want to write down the partitioning and filesystem setup I decided on.
My office machine's new set of disks are a pair of 500 GB NVMe drives and a pair of 2 TB SATA SSDs. I'm using GPT partitioning on all four drives for various reasons. All four drives start with my standard two little partitions, a 256 MB EFI System Partition (ESP, gdisk code EF00) and a 1 MB BIOS boot partition (gdisk code EF02). I don't currently use either of them (my past attempt to switch from MBR booting to UEFI was a failure), but they're cheap insurance for a future. Similarly, putting these partitions on all four drives instead of just my 'system' drives is more cheap insurance.
(Writing this down has made me realize that I didn't format the ESPs. Although I don't use UEFI for booting, I have in the past put updated BIOS firmware images there in order to update the BIOS.)
The two NVMe are my 'system' drives. They have three additional
partitions; a 70 GB partition used for a Linux software RAID mirror
of the root filesystem (including
/var, since I put
all of the system into one filesystem), a 1 GB partition that is a
Linux software RAID mirror swap partition, and the remaining 394.5
GB as a mirrored ZFS pool that holds filesystems that I want to be
as fast as possible and that I can be confident won't grow to be
too large. Right now that's my home directory filesystem and the
filesystem that holds source code (where I build Firefox, Go, and
ZFS on Linux, for example).
The two SATA SSDs are my 'data' drives, holding various larger but less important things. They have two 70 GB partitions that are Linux software RAID mirrors and the remaining space is in in a single partition for another mirrored ZFS pool. One of the two 70 GB partitions is so that I can make backup copies of my root filesystem before upgrading Fedora (if I bother to do so); the other is essentially an 'overflow' filesystem for some data that I want on an ext4 filesystem instead of in a ZFS pool (including a backup copy of all recent versions of ZFS on Linux that I've installed on my machine, so that if I update and the very latest version has a problem, I can immediately reinstall a previous one). The ZFS pool on the SSDs contains larger and generally less important things like my VMWare virtual machine images and the ISOs I use to install them, and archived data.
Both ZFS pools are set up following my historical ZFS on Linux
practice, where they use the /dev/disk/by-id
names for my disks instead of the sdX and nvme... names. Both pools
are actually relatively old; I didn't create new pools for this and
migrate my data, but instead just attached new mirrors to the old
pools and then detached the old drives (more or less). The root filesystem was similarly migrated
from my old SSDs by attaching and removing software RAID mirrors;
the other Linux software RAID filesystems are newly made and copied
restore (and the new software RAID arrays
were added to
/etc/mdadm.conf more or less by hand).
(Since I just looked it up, the ZFS pool on the SATA SSDs was created in August of 2014, originally on HDs, and the pool on the NVMe drives was created in January of 2016, originally on my first pair of (smaller) SSDs.)
Following my old guide to RAID superblock formats, I continued to use the version 1.0 format for everything except the new swap partition, where I used the version 1.2 format. By this point using 1.0 is probably superstition; if I have serious problems (for example), I'm likely to just boot from a Fedora USB live image instead of trying anything more complicated.
All of this feels very straightforward and predictable by now. I've moved away from complex partitioning schemes over time and almost all of the complexity left is simply that I have two different sets of disks with different characteristics, and I want some filesystems to be fast more than others. I would like all of my filesystems to be on NVMe drives, but I'm not likely to have NVMe drives that big for years to come.
(The most tangled bit is the 70 GB software RAID array reserved for a backup copy of my root filesystem during major upgrades, but in practice it's been quite a while since I bothered to use it. Still, having it available is cheap insurance in case I decide I want to do that someday during an especially risky Fedora upgrade.)
Splitting a mirrored ZFS pool in ZFS on Linux
Suppose, not hypothetically, that you're replacing a pair of old disks with a pair of new disks in a ZFS pool that uses mirrors. If you're a cautious person and you worry about issues like infant mortality in your new drives, you don't necessarily want to immediately switch from the old disks to the new ones; you want to run them in parallel for at least a bit of time. ZFS makes this very easy, since it supports up to four way mirrors and you can just attach devices to add extra mirrors (and then detach devices later). Eventually it will come time to stop using the old disks, and at this point you have a choice of what to do.
The straightforward thing is to drop the old disks out of the ZFS
mirror vdev with '
zpool detach', which cleanly removes them (and
they won't come back later, unlike with Linux software RAID). However this is a little bit
wasteful, in a sense. Those old disks have a perfectly good backup
copy of your ZFS pool on them, but when you detach them you lose
any real possibility of using that copy. Perhaps you would like to
keep that data as an actual backup copy, just in case. Modern
versions of ZFS can do this through splitting the pool with '
To quote the manpage here:
Splits devices off pool creating newpool. All vdevs in pool must be mirrors and the pool must not be in the process of resilvering. At the time of the split, newpool will be a replica of pool. [...]
In theory the manpage's description suggests that you can split a
four-way mirror vdev in half, pulling off two devices at once in a
zpool split' operation. In practice it appears that the current
0.8.x version of ZFS on Linux can only
split off a single device from each mirror vdev. This meant that
I needed to split my pool in a multi-step operation.
Let's start with a pool,
maindata, with four disks in a single
newD. We want to split
maindata so that there is a new pool with
First, we split one old device out of the pool:
zpool split -R /mnt maindata maindata-hds oldA
Normally the just split off newpool is not imported (as far as I
know), and certainly you don't want it imported if your filesystems
have explicit '
mountpoint' settings (because then filesystems
from the original and the split off pool will fight over who gets
to be mounted there). However, you can't add devices to exported
pools and we need to add
oldB, so we have to import the new pool
in an altroot. I use
/mnt here out of tradition but you can use
any convenient empty directory.
With the pool split off, we need to detach
oldB from the regular
pool and attach it to
oldA in the new pool to make the new pool
actually be mirrored:
zpool detach maindata oldB zpool attach maindata-hds oldA oldB
This will then resilver the
maindata-hds new pool on to
oldB has an almost exact copy already). Once the
resilver is done, you can export the pool:
zpool export maindata-hds
You now have your mirrored backup copy sitting around with relatively little work on your part.
All of this appears to have worked completely fine for me. I scrubbed
maindata pool before splitting it, just in case, but I don't
think I bothered to scrub the
maindata-hds new pool after the
resilver. It's only an emergency backup pool anyway (and it gets
less and less useful over time, since there are more divergences
between it and the live pool).
PS: I don't know if you can make snapshots, split a pool, and then do incremental ZFS sends from filesystems in one copy of the pool to the other to keep your backup copy more or less up to date. I wouldn't be surprised if it worked, but I also wouldn't be surprised if it didn't.
Linux kernel Security Modules (LSMs) need their own
Over on Twitter, I said something I've said before:
Once again, here I am hating how Linux introduced additional kernel security modules without also adding an errno for 'the loadable security module denied permissions'.
Lack of a LSM errno significantly complicates debugging problems, especially if you don't normally use LSMs.
Naturally there's a sysadmin story here, but let's start with the background (even if you probably know it).
SELinux and Ubuntu's AppArmor
are examples of Linux Security Modules; each of
them adds additional permission checks that you must pass over and
above the normal Unix permissions. However, when they reject your
access, they don't actually tell you this specifically; instead you
get the generic Unix error of
EPERM, 'operation not permitted',
which is normally what you get if, say, the file is unreadable to
your UID for some reason.
We have an internal primary master DNS server for our DNS zones (a
so called 'stealth master'), which runs Ubuntu instead of OpenBSD
for various reasons. We have the winter holiday break coming up and
since we've had problems with it coming up cleanly in the past, so last week it seemed like
a good time to reboot it under controlled circumstances to make
sure that at least that worked. When I did that, named (aka Bind)
refused to start with a 'permission denied' error (aka
when it tried to read its
named.conf configuration file. For
reasons beyond the scope of this entry, this file lives on our
central administrative NFS filesystem, and when you throw NFS into
the picture various things can go wrong with access permissions.
So I spent some time looking at file and directory permissions, NFS
mount state, and so on, until I remembered something my co-worker
had mentioned in passing.
Ubuntu defaults to installing and using AppArmor, but we don't
like it and we turn it off almost everywhere (we can't avoid it for
MySQL, although we can make it harmless).
That morning we had applied the pending Ubuntu packages updates,
as one does, and one of the packages that got updated had been the
AppArmor package. It turns out that in our environment, when an
AppArmor package update is applied, AppArmor gets re-enabled (but
I think not started immediately); when I rebooted our primary DNS
master, it now started AppArmor. AppArmor has a profile for Bind
that only allows for a configuration file in the standard place,
not where we put our completely different and customized one, and
so when Bind tried to read our
named.conf, the AppArmor LSM said
'no'. But that 'no' was surfaced only as an
EPERM error and so I
went chasing down the rabbit hole of all of the normal causes for
People who deal with LSMs all of the time will probably be familiar
with this issue and will immediately move to the theory that any
unfamiliar and mysterious permission denials are potentially the
LSM in action. But we don't use LSMs normally, so every time one
enables itself and gets in our way, we have to learn all about this
all over again. The process of troubleshooting would be much easier
if the LSM actually told us that it was doing things by having a
errno value for 'LSM permission denied', because then we'd
know right away what was going on.
(If Linux kernel people are worried about some combination of security concerns and backward compatibility, I would be happy if they made this extra errno value an opt-in thing that you had to turn on with a sysctl. We would promptly enable it for all of our servers.)
PS: Even if we didn't have our
named.conf on a NFS filesystem,
we probably wouldn't want to overwrite the standard version with
our own. It's usually cleaner to build your own completely separate
configuration file and configuration area, so that you don't have to
worry about package updates doing anything to your setup.
Working out which of your NVMe drives is in what slot under Linux
One of the perennial problems with machines that have multiple
drives is figuring out which of your physical drives is
sdb, and so on; the mirror problem is arranging things so that
the drive you want to be the boot drive actually is the first drive.
In sanely made server hardware this is generally relatively easy,
but with desktops you can run into all sorts of problems, such as
how desktop motherboards can wire things up oddly. Under some situations, NVMe drives
make this easier than with SATA drives, because NVMe drives are
PCIe devices and so have distinct PCIe bus addresses and possibly
PCIe bus topologies.
First off, I will admit something. The gold standard for doing this
reliably under all circumstances is to record the serial numbers
of your NVMe drives before you put them into your system and then
smartctl -i /dev/nvme0n1' to find each drive from its serial
number. It's always possible for a motherboard with multiple M.2
slots to do perverse things with its wiring and PCIe bus layout,
so that what it labels as the first and perhaps best M.2 slot is
actually the second NVMe drive as Linux sees it. But I think that
generally it's pretty likely that the first M.2 slot will be earlier
in PCIe enumeration than the second one (if there is a second one).
And if you have only one M.2 slot on the motherboard and are using
a PCIe to NVMe adapter card for your second NVMe drive, the PCIe
bus topology of the two NVMe drives is almost certain to be visibly
All of this raises the question of how you get the PCIe bus address
of a particular NVMe drive. We can do this by using
Linux makes your PCIe devices and topology visible in sysfs. In specific, every NVMe device appears as a
/sys/block that gives you the path to its PCIe node
(and in fact the full topology). So on my office machine in its current NVMe setup, I have:
; readlink nvme0n1 ../devices/pci0000:00/0000:00:03.2/0000:0b:00.0/[...] ; readlink nvme1n1 ../devices/pci0000:00/0000:00:01.1/0000:01:00.0/[...]
This order on my machine gives me a surprise, because the two NVMe
drives are not in the order I expected. In fact they're apparently
not in the order that the kernel initially detected them in, as a
look into '
nvme nvme0: pci function 0000:01:00.0 nvme nvme1: pci function 0000:0b:00.0
This is the enumeration order I expected, with the motherboard M.2
slot at 01:00.0 detected before the adapter card at 0b:00.0 (for
more on my current PCIe topology, see this entry).
Indeed the original order appears to be preserved in bits of sysfs,
with path components like
Perhaps the kernel assigned actual nvmeXn1 names backward, or perhaps
udev renamed my disks for reasons known only to itself.
(But at least now I know which drive to pull if I have trouble with nvme1n1. On the other hand, I'm now doubting the latency numbers that I previously took as a sign that the NVMe drive on the adapter card was slower than the one in the M.2 slot, because I assumed that nvme1n1 was the adapter card drive.)
Once you have the PCIe bus address of a NVMe drive, you can look for additional clues as to what physical M.2 slot or PCIe slot that drive is in beyond just how this fits into your PCIe bus topology. For example, some motherboards (including my home machine) may wind up running the 'second' M.2 slot at 2x instead of x4 under some circumstances, so if you can find one NVMe drive running at x2 instead of x4, you have a strong clue as to which is which (assuming that your NVMe drives are x4 drives). You can also have a PCIe slot be forced to x2 for other reasons, such as motherboards where some slots share lanes and bandwidth. I believe that the primary M.2 slot on most motherboards always gets x4 and is never downgraded (except perhaps if you ask the BIOS to do so).
You can also get the same PCIe bus address information (and then a
lot more) through
udevadm, as noted by a commentator on yesterday's
udevadm info /sys/block/nvme0n1'
will give you all of the information that udev keeps. This doesn't
seem to include any explicit information on whether the device was
renamed, but it does include the kernel's assigned minor number and
on my machine, nvme0n1 has minor number 1 while nvme1n1 has minor
number 0, which suggests that it was assigned first.
(It would be nice if udev would log somewhere when it renames a device.)
PS: Looking at the PCIe bus addresses associated with SATA drives usually doesn't help, because most of the time all of your SATA drives are attached to the same PCIe device.
Linux makes your PCIe topology visible in sysfs (
Getting some NVMe drives for my office machine
has been an ongoing education into many areas of PCIe, including
how to your PCIe topology with
understanding how PCIe bus addresses and topology relate to each
other. Today I coincidentally discovered
that there is another way to look into your system's PCIe topology,
because it turns out that the Linux kernel materializes it a directory
hierarchy in the sysfs filesystem that is usually mounted on
Generally, the root of the PCI(e) bus hierarchy is going to be found
/sys/devices/pci0000:00. Given an understanding of PCIe
addresses, we can see that 0000:00 is the
usual domain and starting PCIe bus number. In this directory are a
whole bunch of subdirectories, named after the full PCIe bus address
of each device, so you get directories named things like '
If you take of the leading '0000:', this corresponds to what '
-v' will report as device 00:03.2. For PCIe devices that act as
bridges, there will be subdirectories for the PCIe devices behind
the bridge, with the full PCIe address of those devices. So in my
office machine's current PCIe topology, there
is a '
0000:0b:00.0' subdirectory in the
which is my second NVMe drive behind the 00:03.2 PCIe bridge.
0000:00:03.1 is my Radeon graphics card, which actually
has two exposed PCIe functions;
0000:0a:00.0 is the video side,
0000:0a:00.1 is 'HDMI/DP Audio'.)
There are a number of ways to use this
/sys information, some of
which are for future entries. The most obvious use is to confirm
your understanding of the topology and implied PCIe bus addresses
lspci -tv' reports. If the
/sys directory hierarchy matches
your understanding of the output, you have it right. If it doesn't,
something is going on.
The other use is a brute force way of finding out what the topology
of a particular final PCIe device is, by simply finding it in the
hierarchy with '
find /sys/devices/pci0000:00 -name ..', where the
name is its full bus address (with the 0000: on the front). So, for
example, if we know we have an Ethernet device at 06:00.0, we can
find where it is in the topology with:
; cd /sys/devices/pci0000:00 ; find . -type d -name 0000:06:00.0 -print ./0000:00:01.3/0000:02:00.2/0000:03:03.0/0000:06:00.0
-type d' avoids having to filter out some symlinks for
the PCIe node in various contexts; in this case it shows up as
This shows us the path through the PCIe topology from the root, through 00:01.3, then 02:00.2, then finally 03:03.0. This complex path is because this is a device hanging off the AMD X370 chipset instead of off of the CPU, although not all chipset attached PCIe devices will have such a long topology.
Until I looked at the
lspci manpage more carefully, I was going
to say that this was the easiest way to go from a PCIe bus address
to the full path to the device with all of the PCIe bus addresses
involved. However, it turns out that in sufficiently modern versions
lspci -PP' will report the same information in a
shorter and more readable way:
; lspci -PP -s 06:00.0 00:01.3/02:00.2/03:03.0/06:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06)
Unfortunately the version of
lspci on our Ubuntu 18.04 machines
is not sufficiently modern; on those machines, a
find remains the
easiest way. You can do it from the output of either '
lspci -v', as described in an earlier entry, but you have to do some manual work to
reconstruct all of the PCIe bus addresses involved.
Fedora is got a good choice if long term stability and usability is a high priority
Every so often I hear about people running servers or infrastructure that they care about on Fedora, and my eyebrows almost always go up. I like Fedora and run it by choice on all of my desktops and my work laptop, but I'm the sole user on these machines, I know what I'm getting into, and I'm willing to deal with the periodic disruptions that Fedora delivers. Fedora is a good Linux distribution on the whole, but it is what I would call a 'forward looking' distribution; it is one that is not that much interested in maintaining backward compatibility if that conflicts with making the correct choice for right now, the choice that you'd make if you were starting from scratch. The result is that every so often, Fedora will unapologetically kick something out from underneath long term users and you get to fix your setup to deal with the new state of affairs.
All of this sounds very theoretical, so let me make it quite concrete with my tweet:
Today I learned that in Fedora 31, /usr/bin/python is Python 3 instead of Python 2. I hope Ubuntu doesn't do that, because if it does our users are going to kill us.
I learned this because I've recently upgraded my work laptop to
Fedora 31 and on it I run a number of Python based programs that
started with '
#!/usr/bin/python'. Before the upgrade, that was
Python 2 and all of those programs worked. After the upgrade, as I
found out today, that was Python 3 and many of the programs didn't
(Fedora provides no way to control this behavior as far as I can
/usr/bin/python is is not controlled through Fedora's
instead it's a symlink that's directly supplied by a package, and
there's no version of the package that provides a symlink that goes
to Python 2.)
This is fine for me. It's my own machine, I know what changed on
it recently, I don't have to support a mixed base of older and newer
Fedora machines, and I'm willing to put the pieces back together.
At work, we've been running a
Linux environment for fifteen years or so now, we have somewhere
around a thousand users, we have to run a mixed base of distribution
versions, and some of those users will have programs that start
#!/usr/bin/python', possibly programs they've even forgotten
about because they've been running quietly for so long. This sort of
change would cause huge problems for them and thus for us.
Fedora's decision here is not wrong, for Fedora, but it is a very
Fedora decision. If you were doing a distribution from scratch for
today, with no history behind it at all,
to Python 3 is a perfectly rational and good choice. Making that
decision in a distribution with history is choosing one set of
priorities over another; it is prioritizing the 'correct' and modern
choice over not breaking existing setups and not making people using
your distribution do extra work.
I think it's useful to have Linux distributions that prioritize this way, and I don't mind it in the distribution that I use. But I know what I'm getting into when I choose Fedora, and it's not for everyone.