Understanding plain Linux NVMe device names (in
/dev and kernel messages)
On Linux, plain disk names for most modern disk devices are in
/dev/sda for the whole disk and
/dev/sda3 for the
partition (regardless of whether the disk is partitioned through
or old MBR). When I got NVMe SSDs for my office workstation, one of my many discoveries about them is
that Linux gives them different and more oddly formed names. Since I
had many other NVMe related issues on my mind at the time, I didn't look
into the odd names; I just accepted them and moved on. But now I want
to actually understand how Linux's NVMe device names are formed and
what they mean, and it turns out to be relatively simple.
Let's start with the actual names. On Linux, NVMe devices have three
levels of names. On my office workstation, for the first NVMe device
/dev/nvme0n1, and then a series of
/dev/nvme0n1p<X> devices for each partition. Unusually,
is a character device, not a block device. Kernel messages will talk
about both '
nvme0' and '
nvme nvme0: pci function 0000:01:00.0 nvme nvme0: 15/0/0 default/read/poll queues nvme0n1: p1 p2 p3 p4 p5
(I don't know yet what names will appear in kernel messages about IO errors.)
If I want to partition the disk, install GRUB bootblocks, or the like,
I want to use the '
nvme0n1' name. Querying certain sorts of NVMe
information is done using '
nvme0'. I can apparently use either name
for querying SMART information with
Numbering NVMe SSDs instead of giving them letters and naming
partitions with '
p<X>' instead of plain numbers are both sensible
changes from the somewhat arcane
sd... naming scheme. The unusual
thing is the '
n1' in the middle. This is present because of a
NVMe feature called "namespaces",
which allows you (or someone) to divide up a NVMe SSD into multiple
separate ranges of logical block addresses that are isolated from
each other. Namespaces are numbered starting from one, and I think
that most NVMe drives have only one, hence '
nvme0n1' as the base
name for my first NVMe SSD's disk devices.
(This is also likely part of why '
nvme0' is a character device instead
of a block device. Although I haven't checked the NVMe specification,
I suspect that you can't read or write blocks from a NVMe SSD without
specifying the namespace.)
The Arch wiki page on NVMe drives has a nice
overview of all sorts of things you can find out about your NVMe drives
nvme command. Based on the Arch
nvme manpage, it has a lot of sub-commands
For my expected uses, I suspect that I will never change or manipulate NVMe namespaces on my NVMe drives. I'll just leave them in their default state, as shipped by the company making them. Probably all consumer NVMe SSDs will come with only a single namespace by default, so that people can use the entire drive's official capacity as a single filesystem or partition without having to do strange things.
I should probably learn command-line NetworkManager usage
I'm generally not a fan of NetworkManager on the machines I deal with, but I do wind up dealing with it at the command line level every so often, most recently for setting up a WireGuard client on my work laptop. There was a time when it felt that NetworkManager was the inevitable future of networking on Linux even on servers, but fortunately systemd-networkd has mostly made that go away. Still, systemd-networkd has limitations and isn't as comprehensive as NetworkManager, NetworkManager is the face of networking on a lot of Linux configurations, and someday I may be forced to deal with NetworkManager on a regular basis.
(Fedora keeps threatening to remove the
that drive my DLS PPPoE link, and systemd-networkd doesn't currently
have support for PPPoE.)
All of this leaves me feeling that not really knowing even the basics of NetworkManager general concepts and command line usage is a gap in my practical Linux knowledge that matters, and that I should fix. Well, to put it bluntly, it feels like I'm burying my head in the sand. Even if I never really use it, learning the basics of NetworkManager command line usage would give me an informed opinion, instead of my current mostly uninformed one.
The low impact approach to learning NetworkManager command line usage would be to explore it on my work laptop, which already uses NetworkManager. I normally use the Cinnamon Network Manager GUI (which is not nm-applet, it turns out), but I could switch to doing my network manipulation through the command line, and also read and try to understand all of the configured connection parameters.
The high impact approach would be to try to set up a version of my
home desktop's DSL PPPoE connection in
NetworkManager. Many years ago I configured a version of my DSL
connection on my laptop, so in theory I could
cross-check my NetworkManager flailing against that version (although
I should first make sure it still works). As a side benefit, this would
leave me prepared for when Fedora carries through its threat to remove
ifup and my current DSL PPPoE setup immediately stops working.
(I've written this partly in the hopes of motivating myself into doing some NetworkManager learning, even if I don't manage much.)
It's nice when programs switch to being launched from systemd user units
I recently upgraded my home machine from Fedora 33 to Fedora 34. One of the changes in Fedora 34 is that the audio system switched from PulseAudio to PipeWire (the Fedora change proposal, an article on the switch). Part of this switch is that you need to run different daemons in your user session. For normal people, this is transparently handled by whichever standard desktop environment they're using. Unfortunately I use a completely custom desktop, so I have to sort this out myself (this is one way Fedora upgrades are complicated for me). Except this time I didn't need to do anything; PipeWire just worked after the switch.
One significant reason for this is that PipeWire arranges to be
started in your user session not through old mechanisms like
/etc/xdg/autostart but through a systemd
user unit (actually
two, one for the daemon and one for the socket). Systemd user units
are independent of your desktop and get started automatically, which
means that they just work even in non-standard desktop environments
(well, so far).
(As covered in the Arch Wiki, there are some things you need to do in an X session.)
One of the things that's quietly making my life easier in my custom desktop environment is that more things are switching to being started through systemd user units instead of the various other methods. It's probably a bit more work for some of the programs involved (since they can't assume direct access to your display any more and so on), but it's handy for me, so I'm glad that they're investing in the change.
PS: It turns out that the basic PulseAudio daemon was also being
set up through systemd user units on Fedora 33. But PulseAudio
did want special setup under X, with an
file that ran
/usr/bin/start-pulseaudio-x11. It's possible that
PipeWire is less integrated with the X server than PulseAudio is.
See the PulseAudio X11 modules
PPS: Apparently I now need to find a replacement for running '
-q set Master ...' to control my volume from the keyboard. This apparently still works for some people
but not for me; for now '
pactl' does, and it may be the more or
less official tool for doing this with PipeWire for the moment, even
though it's from PulseAudio.
Setting up a WireGuard client with NetworkManager (using
For reasons beyond the scope of this entry, I've been building a VPN server that will support WireGuard (along with OpenVPN and L2TP). A server needs a client, so I spent part of today setting up my work laptop as a WireGuard client in a 'VPN' configuration, under NetworkManager because that's what my laptop uses. I was hoping to do this through the Cinnamon GUIs for NetworkManager, but unfortunately while NetworkManager itself has supported WireGuard for some time, this support hasn't propagated into GUIs such as the GNOME Control Center (cf) or the NetworkManager applet that Cinnamon uses.
I'm already quite familiar with WireGuard in general, so I found
that the easiest way to start was to set up a basic WireGuard
configuration file for the connection in
including both the main configuration (with the laptop's key and
my local port) and a
[Peer] section for the server. Since I'm
using WireGuard here in a VPN configuration, instead of to reach
just some internal IPs, I set
to 0.0.0.0/0. After writing
wg0.conf, I then imported it into
nmcli connection import type wireguard file /etc/wireguard/wg0.conf
(For what can go in the configuration file, start with
wg-quick(8). I suspect
that NetworkManager doesn't support some of the more advanced keys.
I stuck to the basics. The import process definitely ignores the
various script settings supported by
nm_vpn_wireguard_import() in nm-vpn-helpers.c.)
Imported connections are apparently set to auto-connect, which isn't what I wanted, plus there were some other things to adjust (following the guide of Thomas Haller's WireGuard in NetworkManager):
nmcli con modify wg0 \ autoconnect no \ ipv4.method manual \ ipv4.address 172.29.50.10/24 \ ipv4.dns <...>
At this point you might be tempted to set
ipv4.gateway, and indeed
that's what I did the first time around. It turns out that this is
a mistake, because these days NetworkManager will do the right thing
based on the 'accept everything'
AllowedIPs I set, right down to
setting up policy based routing with a fwmark so that encrypted
traffic to the WireGuard VPN server doesn't try to go over WireGuard.
If you set
ipv4.gateway as well, you wind up with two default
routes and then your encrypted WireGuard traffic may try to go over
your WireGuard connection again, which doesn't work.
(See the description of '
ip4-auto-default-route in the WireGuard
The full index of available NetworkManager settings in various
sections is currently here; the
ones most useful to me are probably
Getting DNS to work correctly requires a little extra step, or at
least did for me. While the
wg0 connection is active, I want all
of my DNS queries to go to our internal resolving DNS server and
also to have a search path of our university subdomain. This
apparently requires explicitly including '
~' in the NetworkManager
DNS search path:
nmcli con modify wg0 \ ipv4.dns-search "cs.toronto.edu,~"
You (I) can see a lot of settings for the WireGuard setup with
nmcli connection show wg0', including active ones, but this seems
to omit NetworkManager's view of the WireGuard peers. To see that,
I needed to look directly at the configuration file that NetworkManager
I'm someday going to need to edit this directly to modify the
WireGuard VPN server's endpoint from my test machine to the production
(The NetworkManager RFE for configuring WireGuard peers in
is issue #358.)
With no GUI support for WireGuard connections, I have to bring this
WireGuard VPN up and down with '
nmcli con up wg0' and '
down wg0'. Once I have the new VPN server in production, I'll be
writing little scripts to do this for me. Hopefully this will be
improved some day, so that the NetworkManager applet allows you to
activate and deactivate WireGuard connections and shows you that
one is active.
If I wanted a limited VPN that only sent traffic to our internal
networks over my WireGuard link, I would configure the server's
AllowedIPs to the list of networks and then I believe that
NetworkManager would automatically set up routes for them. However,
I don't know how to make this work (in NetworkManager) if the
WireGuard VPN server itself was on one of the subnets I wanted to
reach over WireGuard. For my laptop, routing all traffic over
WireGuard to work is no worse than using our OpenVPN or L2TP VPN
servers, which also do the same thing by default.
(On my home desktop, I use hand built fwmark-based policy rules to deal with my WireGuard endpoint being on a subnet I want to normally reach over WireGuard. NetworkManager will build the equivalents for me when I'm routing 0.0.0.0/0 over the WireGuard link, but I believe not in other situations.)
Some ways to get (or not get) information about system memory ranges on Linux
I recently learned about
lsmem, which is
described as "list[ing] the ranges of available memory [...]". The
source I learned it from was curious why
lsmem on a modern 64-bit
machine didn't list all of the low 4 GB as a single block (they
were exploring kernel memory zones, where the
low 4 GB of RAM are still a special 'DMA32' zone). To start with,
I'll show typical
lsmem default output from a machine with 32 GB
; lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x00000000dfffffff 3.5G online yes 0-27 0x0000000100000000-0x000000081fffffff 28.5G online yes 32-259 Memory block size: 128M Total online memory: 32G Total offline memory: 0B
Lsmem is reporting information from
Both the sysfs hierarchy and lsmem itself apparently come originally
from the IBM S390x architecture. Today this sysfs hierarchy
apparently only exists for memory hotplug, and there
are some signs that kernel developers aren't fond of it.
(Update: I'm wrong about where the sysfs memory hierarchy comes from; see this tweet from Dave Hansen.)
On the machines I've looked at, the hole reported by
authentic, in that
/sys/devices/system/memory also doesn't have
any nodes for that range (on the machine above, for blocks 28, 29,
30, and 31). The specific gap varies from machine to machine.
However, all of the information from
lsmem may well be a
simplification of a more complex reality.
The kernel also exposes physical memory range information through
/proc/iomem (on modern kernels you'll probably have to read this
as root to get real address ranges). This has a much more complicated
view of actual RAM, one with many more holes than what
/sys/devices/system/memory show. This is especially the case in
the low 4G of memory, where for example the system above reports a
whole series of chunks of reserved memory, PCI bus address space,
ACPI tables and storage, and more. The high memory range is simpler,
but still not quite the same:
100000000-81f37ffff : System RAM 81f380000-81fffffff : RAM buffer
The information from
/proc/iomem has a lot of information about
PCI(e) windows and other things, so you may want to narrow down
what you look at. On the system above,
/proc/iomem has 107 lines
but only nine of them are for 'System RAM', and all but one of them
are in the physical memory address range that
lsmem lumps into
the 'low' 3.5 GB:
00001000-0009d3ff : System RAM 00100000-09e0ffff : System RAM 0a000000-0a1fffff : System RAM 0a20b000-0affffff : System RAM 0b020000-d17bafff : System RAM d17da000-da66ffff : System RAM da7e5000-da8eefff : System RAM dbac7000-ddffffff : System RAM
(I don't have the energy to work out how much actual RAM this represents.)
Another view of physical memory range information is the kernel's report of the BIOS 'e820' memory map, printed during boot. On the system above, this says that the top of memory is actually 0x81f37ffff:
BIOS-e820: [mem 0x0000000100000000-0x000000081f37ffff] usable
I don't know if the Linux kernel exposes this information in
You can also find various other things about physical memory ranges
in the kernel's boot messages, but I don't know enough to analyze them.
What's clear is that in general, a modern x86 machine's physical memory ranges are quite complicated. There are historical bits and pieces, ACPI and other data that is in RAM but must be preserved, PCI(e) windows, and other things.
(I assume that there is low level chipset magic to direct reads and writes for RAM to the appropriate bits of RAM, including remapping parts of the DIMMs around so that they can be more or less fully used.)
Understanding something about udev's normal network device names on Linux
For a long time, systemd's version of udev has attempted to give
network interfaces what the systemd people call predictable or
stable names. The current naming scheme is more or less documented
with an older version in their Predictable Netwwork Interface
wiki page. To understand how the naming scheme is applied in
practice by default, you also need to read the description of
NamePolicy= in systemd.link(5), and
inspect the default .link file, '99-default.link', which might be
in either /lib/systemd/network or /usr/lib/systemd/network/. It
appears that the current network name policy is generally going to
kernel database onboard slot path", possibly with '
at the front in addition. In practice, on most servers and desktops,
most network devices will be named based on their PCI slot identifier,
using systemd's '
path' naming policy.
A PCI slot identifier is what ordinary '
lspci' will show you as
the PCIe bus address. As covered in the
lspci manpage, the fully
general form of a PCIe bus address is <domain>:<bus>:<device>.<function>,
and on many systems the domain is always 0000 and is omitted. Systemd
turns this into what it calls a "PCI geographical location", which is
(translated into lspci's terminology):
prefix [Pdomain] pbus sdevice [ffunction] [nphys_port_name | ddev_port]
The domain is omitted if it's 0 and the function is only present
if it's a multi-function device. All of the numbers are in decimal,
lspci presents them in hex. For Ethernet devices, the prefix
(I can't say anything about the '
n' and '
d' suffixes because
I've never seen them in our hardware.)
The device portion of the PCIe bus address is very frequently 0, because many Ethernet devices are behind PCIe bridges in the PCIe bus topology. This is how my office workstation is arranged, and how almost all of our servers are. The exceptions are all on bus 0, the root bus, which I believe means that they're directly integrated into the core chipset. This means that in practice the network device name primarily comes from the PCI bus number, possibly with a function number added. This gives 'path' based names of, eg, enp6s0 (bus 6, device 0) or enp1s0f0 and enp1s0f1 (bus 1, device 0, function 0 or 1; this is a dual 10G-T card, with each port being one function).
(Onboard devices on servers and even desktops are often not integrated
into the core chipset and thus not on PCIe bus 0. Udev may or may
not recognize them as onboard devices and assign them '
names. Servers from good sources will hopefully have enough correct
DMI and other information so that udev can do this.)
As always, the PCIe bus ordering doesn't necessarily correspond to what you think of as the actual order of hardware. My office workstation has an onboard Ethernet port on its ASUS Prime X370-Pro motherboard and an Intel 1G PCIe card, but they are (or would be) enp8s0 and enp6s0 respectively. So my onboard port has a higher PCIe bus number than the PCIe card.
There is an important consequence of this, which is that systemd's default network device names are not stable if you change your hardware around, even if you didn't touch the network card itself. Changing your hardware around can change your PCIe bus numbers, and since the PCIe bus number is most of what determines the network interface name, it will change. You don't have to touch your actual network card for this to happen; adding, changing, or relocating other hardware between physical PCIe slots can trigger changes in bus addresses (primarily if PCIe bridges are added or removed).
(However, adding or removing hardware won't necessarily change existing PCIe bus addresses even if the hardware changed has a PCIe bridge. It all depends on your specific PCIe topology.)
Sidebar: obtaining udev and PCIe topology information
udevadm info /sys/class/net/<something>' will give you
a dump of what udev thinks and knows about any given network
interface. The various
ID_NET_NAME_* properties give you the
various names that udev would assign based on that particular
naming policy. The 'enp...' names are
on server hardware you may also see
database' naming scheme comes from information in
On modern systems, '
lspci -PP' can be used to show the full PCIe
path to a device (or all devices). On Ubuntu 18.04, you can also
use sysfs to work through your PCIe topology,
in addition to '
lspci -tv'. See also my entry on PCIe bus
lspci, and working out your PCIe bus topology.
The initramfs for old kernels can hide old versions of things
In a recent entry, I more or less blamed a new minor Linux kernel version for changing the naming of my network interface. I had reasonable reasons to say this beyond just rebooting into 5.12.12 and having the problem appear; I also rebooted back into 5.12.11 and the problem disappeared again (I ended up going back and forth repeatedly and this was consistent). When the only changing thing is the kernel version, you can reasonably suspect it, instead of (say) an upgrade to udev that you also installed between the two kernels. However, I'm not so sure of that any more.
I'm running Fedora on this desktop, and Fedora normally doesn't rebuild the initramfs for existing kernels when you upgrade packages and install new kernels. This means that when I boot my Fedora 5.12.11, I'm not merely running that kernel, I'm running an initramfs with programs and configuration files that were frozen when that kernel was installed. If there was a udev update that changed its early boot behavior, that update isn't in the 5.12.11 initramfs. Although I thought I only changed the kernel version by booting back and forth between 5.12.11 and 5.12.12, I was also changing the versions of what ran during early boot, possibly as well as configuration files they used. This may well have fooled me about what the cause of my problem was.
(I know, I once said Fedora rebuilt the initramfs for all of your kernels when you installed new DKMS modules. Apparently I was wrong about that, and was seeing something else.)
In short, what looks like an issue in the new kernel may actually be a change in the new initramfs that you get along with the new kernel. It's hard to tell for sure, although you can try rebuilding the initramfs for an older kernel if you can work out how to do this correctly. Of course, if you do rebuild an initramfs for an old kernel to see if it's really the kernel that's at fault, you definitely want to save a copy of your working old initramfs.
(I've seen this before for configuration files, for example when Fedora embedded my current sysctl settings in the initramfs.)
Despite potentially causing issues, not rebuilding is quite sensible. Generally you want to preserve old working initramfses the way they are just as you want to preserve old kernels (certainly I did in this case, since my 5.12.11 environment kept working). People also want to do less work on package upgrades, and not rebuilding four or five initramfses is much less work than doing so.
Giving your Linux network interfaces fixed names (under udevd and networkd)
Suppose, not entirely hypothetically, that you always want your
machine's primary network interface to be called '
of what the combination of the kernel, networkd, and the systemd
udevd want to call them today (something that has been known to
change). Until recently, my (incorrect) setup for this was a
that looked like this:
[Match] MACAddress=2c:fd:a1:xx:xx:xx [Link] Description=Onboard motherboard port MACAddressPolicy=persistent Name=em0 # Stop VLAN renaming NamePolicy=keep
I had this
NamePolicy because I had VLANs on top of
this was how I made them work.
.link file worked for about a year and a half, and then I
upgraded my Fedora 33 workstation from 5.12.11 to 5.12.12 and
rebooted. It promptly dropped off the network because my interface
had the wrong name
and nothing got configured on it.
What I was trying to do was rename the interface with that MAC
em0. What my addition of
NamePolicy=keep did was
create a situation where the interface would be renamed to
if and only if nothing else had renamed it before udevd processed
.link file. In 5.12.12 (but not 5.12.11), something (either
the kernel or udevd) decided to rename my interface to
.link file took effect, and then the interface didn't
get renamed again to
(This is the implication of '[...] or all policies configured [in
NamePolicy] must fail' in the manpage's
description of '
Name='. If the device hasn't already been given
a name, the 'keep' policy would fail and it would be renamed to
em0 by my '
If you (I) want to give your network interfaces fixed names but have
.link files apply only to real Ethernet interfaces instead of
matching broadly, what I believe you want
[Match] MACAddress=2c:fd:a1:xx:xx:xx Type=ether # Before systemd v245, use eg # Property=ID_BUS=pci [Link] Description=Onboard motherboard port MACAddressPolicy=persistent Name=em0
NamePolicy, this will unconditionally rename anything
matching that MAC to
Type=ether, this will only apply
to real Ethernet devices, not your VLANs or other things that inherit
the MAC from the underlying Ethernet interface.
PS: At this point one may want to read the systemd.net-naming-scheme manpage. I believe that names of the form 'emX' are safe from ever colliding with kernel-assigned interface names, but I'm not completely sure.
PPS: In 5.12.12, my kernel boot logs clearly show that there are two
renamings with this
igb 0000:08:00.0 enp8s0: renamed from eth0 [...] igb 0000:08:00.0 em0: renamed from enp8s0
So my new
.link doesn't prevent the initial renaming in 5.12.12
enp8s0; it just allows my
.link to rename the interface again
em0 that I want.
Be careful when matching on Ethernet addresses in systemd-networkd
A not uncommon pattern in networkd is to write a
file that selects the hardware to work on by MAC address, because that's
often more stable than many of the other alternatives. For instance,
you might write a .link file for your motherboard like this:
[Match] MACAddress=2c:fd:a1:xx:xx:xx [Link] Description=Onboard motherboard port MACAddressPolicy=persistent Name=em0
Unfortunately this is dangerous, because some virtual devices
inherit Ethernet addresses from their parent device and networkd
will allow virtual devices to match against just Ethernet addresses.
In particular VLANs inherit the Ethernet address from their underlying
network device, so if you have one or more VLANs on top of
they will all match this (and then they'll try to rename themselves
em0). The same can happen
if you have a
.network file that matches with MACAddress in order
to deal with variable network names for the same underlying connection.
(If you have a real device that matches this way and creates VLANs on top of itself, networkd may be smart enough to recognize that it has a recursive situation, or it may blow up. I haven't tested.)
In other words, if you tell networkd that a
.link or a
file applies to anything with a specific Ethernet address, networkd
takes that to really mean anything. You may have meant this to apply
(only) to your actual Ethernet device, but the
.link file doesn't
say that and networkd won't infer it.
In systemd v245 or later, what you probably want is to restrict any
Ethernet hardware matches to real Ethernet devices with the additional
requirement of '
[Match] MACAddress=2c:fd:a1:xx:xx:xx Type=ether
(Systemd v245 was released in February of 2020 and is in Ubuntu
20.04 and the current versions of Fedora, but isn't in Debian stable.
Support for the current meaning of
Type= that allows matching
ether' was added in this commit
as a result of issue #14952. To my surprise,
this significant improvement doesn't seem to have been noted in the
ether' type applies to both PCI Ethernet ports and USB
Ethernet devices, but it doesn't apply to wireless devices; those
wlan'. As the manpage covers, '
networkctl list' can tell
you what your devices are. VLANs are type '
If you have a systemd (and thus a systemd-networkd) that's older
than v245, I think the only thing you can do is match on a property
of the device, obtained from '
udevadm info /sys/class/net/<what>'.
For a lot of physical hardware, the obvious property is that it's
on a PCI bus:
[Match] MACAddress=2c:fd:a1:xx:xx:xx Property=ID_BUS=pci
(I have to say that I haven't tested this, I'm just following the manpage.)
However, USB Ethernet devices are '
ID_BUS=usb', not PCI, while
a laptop's onboard wireless most likely is a PCI device, which is
the case on my Dell XPS 13. My laptop's
wireless device is also '
DEVTYPE=wlan', while even now real
Ethernet devices have no
DEVTYPE (as of systemd v248 on a
Fedora 34 virtual machine).
(This elaborates on a tweet of mine.)
PS: I'm not sure whether the matching here is being done by systemd-networkd, the systemd version of udevd, or both of them. It's quite possible that both programs and subsystems are doing it at different times and in different circumstances.
Some notes on what's in Linux's
/sys/class/net for network interface status
Due to discovering that one of our servers had had a network
interface at 100 Mbits/sec for some time,
I've become interested in what information is exposed by the Linux
kernel about network interfaces in
/sys, specifically in
/sys/class/net/<interface>. I'm mostly interested in the information
there because it's the source of what the Prometheus host agent exposes as network
interface status metrics, and thus what's easy to monitor and alert
on in our metrics and monitoring setup.
The overall reference for this is the Linux kernel's sysfs-class-net,
which documents the
/sys fields directly. For the
file, you also need the kernel's include/uapi/linux/if.h,
and for the
type file, include/uapi/linux/if_arp.h.
Generally sysfs-class-net is pretty straightforward about what
things mean, although you may have to read several entries together.
Not all interfaces have all of the files, for instance the
files aren't present on any servers we have.
flags file has a number of common values you may see, which I'm
going to write down here for my own reference:
- 0x1003 or 4099 decimal
- This is the common value for active Ethernet
interfaces. It is MULTICAST (0x1000) plus UP (0x1) and BROADCAST (0x2).
ifconfigwill report RUNNING as well, but that apparently doesn't appear in sysfs.
- 0x1002 or 4098 decimal
- This is the common value for an inactive
Ethernet interface, whether or not it has a cable plugged in. It
is MULTICAST plus BROADCAST, but without UP.
- 0x9 or 9 decimal
- This is the common value for the loopback interface,
made from UP (0x1) and LOOPBACK (0x8).
- 0x91 or 145 decimal
- This is an UP (0x1), POINTOPOINT (0x10) link that
is NOARP (0x80). This is the
flagsvalue of my Wireguard endpoints.
- 0x1091 or 4241 decimal
- This is an UP (0x1), POINTOPOINT (0x10) link
that is MULTICAST (0x1000) in addition to being NOARP (0x80). This is
flagsvalue of my PPPoE DSL link's PPP connection.
addr_assign_type' file is about the (Ethernet) hardware
address, not any IP addresses that may be associated with the
interface. A physical interface will normally have a value of 0; a
value of 3 means that you specifically set the MAC address. VLAN
interfaces sitting on top of physical devices have a value of 2
(they take their MAC address from the underlying devices's MAC).
name_assign_type is somewhat random, as far as I can tell.
Our Ubuntu machines all have a name assignment type value of 4
('renamed'), while my Fedora machines mostly have a name assignment
type of 3 ('named by userspace'), with one Ethernet device being a
4. My Fedora home machine's
ppp0 device has a value of 1.
The most common
type values are 1 (Ethernet), 772 (the loopback
interface), 512 (PPP), and 65534 ('none', what my Wireguard tunnels
have). Possibly someday Wireguard will have its own
assigned in include/uapi/linux/if_arp.h.
speed value is, as mentioned in sysfs-class-net, in
Mbits/sec. The values I've seen are 100 (100M), 1000 (1G), and 10000
(10G). What gets reported for interfaces without carrier seems to
depend. An UP interface with no carrier will report a speed of -1;
an interface that isn't up has no
speed value and attempts to
read the file will report '
Invalid argument'. The Prometheus host
agent turns all of these into its speed in bytes metric
node_network_speed_bytes by multiplying the speed value by
125000, which normally gives you a metric value of -125000 (UP but
no carrier), 12500000 (100M), 125000000 (1G), or 1250000000 (10G).
(Some Linux distributions in some situations will set additional interfaces to UP as part of trying to do DHCP on them. Otherwise they'll quietly stay down.)
The Prometheus host agent exposes what it calls 'non-numeric data'
/sys/class/net in the
node_network_info metric. This
gives you the device's hardware address and broadcast address,
its name, its duplex (which may be blank for things that don't have
a duplex mode, such as Wireguard links or virtual Ethernets), and
its state (from the
operstate file). Somewhat to my surprise, the
operstate of the loopback interface is 'unknown', not 'up'.
Update: it turns out that the
carrier file is only available for
interfaces that are configured 'UP' (and then is either 0 or 1
depending on if carrier is detected). If the interface is not UP,
attempting to read
carrier fails with 'Invalid argument'.