Wandering Thoughts archives

2020-01-25

A network interface losing and regaining signal can have additional effects (in Linux)

My office at work features a dearth of electrical sockets and as a result a profusion of power bars and other means of powering a whole bunch of things from one socket. The other day I needed to reorganize some of the mess, and as part of that I wound up briefly unplugging the power supply for my 8-port Ethernet switch that my office workstation is plugged into. Naturally this meant that the network interface lost signal for a bit (twice, because I wound up shuffling the power connection twice). Nothing on my desktop really noticed, including all of the remote X stuff I do, so I didn't think more about it. However, when I got home, parts of my Wireguard tunnel didn't work. I eventually fixed the problem by restarting the work end of my Wireguard setup, which does a number of things that including turning on IP(v4) forwarding on my workstation's main network interface.

I already knew that deleting and then recreating an interface entirely can have various additional effects (as happens periodically when my PPPoE DSL connection goes away and comes back). However this is a useful reminder to me that simply unplugging a machine from the network and then plugging it in can have some effects too. Unfortunately I'm not sure what the complete list of effects is, which is somewhat of a problem. Clearly it includes resetting IP forwarding, but there may be other things.

(All of this also depends on your system's networking setup. For instance, NetworkManager will deconfigure an interface that goes down, while I believe that without it, the interface's IP address remains set and so on.)

I'm not sure if there's any good way to fix this so that these settings are automatically re-applied when an interface comes up again. Based on this Stackexchange question and answer, the kernel doesn't emit a udev event on a change in network link status (it does emit a netlink event, which is probably how NetworkManager notices these things). Nor is there any sign in the networkd documentation that it supports doing something on link status changes.

(Possibly I need to set 'IgnoreCarrierLoss=true' in my networkd settings for this interface.)

My unfortunate conclusion here is that if you have a complex networking setup and you lose link carrier on one interface, the simplest way to restore everything may be to reboot the machine. If this is not a good option, you probably should experiment in advance to figure out what you need to do and perhaps how to automate it.

(Another option is to work out what things are cleared or changed in your environment when a network interface loses carrier and then avoid using them. If I turned on IP forwarding globally and then relied on a firewall to block undesired forwarding, my life would probably be simpler.)

InterfaceCarrierLossHasEffects written at 00:24:59; Add Comment

2020-01-15

Stopping udev from renaming your VLAN interfaces to bad names

Back in early December I wrote about Why udev may be trying to rename your VLAN interfaces to bad names, where modern versions of udev tried to rename VLAN devices from the arbitrary names you give them to the base name of the network device they're on. Since the base name is already taken, this fails.

There turns out to be a simple cause and workaround for this, at least in my configuration, from Zbigniew Jędrzejewski-Szmek. In Fedora, all I need to do is add 'NamePolicy=keep' to the [Link] section of my .link file. This makes my .link file be:

[Match]
MACAddress=60:45:cb:a0:e8:dd

[Link]
Description=Onboard port
MACAddressPolicy=persistent
Name=em0
# Stop VLAN renaming
NamePolicy=keep

Setting 'NamePolicy=keep' doesn't keep the actual network device from being renamed from the kernel's original name for it to 'em0', but it makes udev leave the VLAN devices alone. In turn this means udev and systemd consider them to have been successfully created, so you get the usual systemd sys-subsystem-net-devices .devices units for them showing up as fully up.

In a way, 'NamePolicy=keep' in a .link file is an indirect way for me to tell apart real network hardware from created virtual devices that share the same MAC, or at least ones created through networkd. As covered in the systemd.netdev manpage, giving a name to your virtual device is mandatory (Name= is a required field), so I think such devices will always be considered to already have a name by udev.

(This was a change in systemd-241, apparently. It changes the semantics of existing .link files in a way that's subtly not backward compatible, but such is the systemd way.)

However, I suspect that things might be different if I didn't use 'biosdevname=0' in my kernel command line parameters. These days this is implemented in udev, so allowing udev to rename your network devices from the kernel assigned names to the consistent network device naming scheme may be considered a rename for the purposes of 'NamePolicy=keep'. That would leave me with the same problem of telling real hardware apart from virtual hardware that I had in the original entry.

However, for actual matching against physical hardware, I suspect that you can also generally use a Property= on selected attributes (as suggested by Alex Xu in the comments on the original entry). For instance, most people's network devices are on PCI busses, so:

Property=ID_BUS=pci

There are a whole variety of properties that real network hardware has that VLANs don't (based on 'udev info' output), although I don't know about other types of virtual network devices. It does seem pretty safe that no virtual network device will claim to be on a PCI bus, though.

(I haven't tested the Property= approach, since 'NamePolicy=keep' is sufficient in my case.)

UdevNetworkdVLANLinkMatchingII written at 00:42:43; Add Comment

2020-01-10

Fedora 31 has decided to allow (and have) giant process IDs (PIDs)

Every new process and thread on Linux gets a new PID (short for process ID). PIDs are normally assigned sequentially until they hit some maximum value and roll over. The traditional maximum PID value on Unixes has been some number related to a 16-bit integer, either signed or unsigned, and Linux is no exception; the kernel default is generally still 32768 (which is 2^15 exactly, and so not quite authentic to a signed 16-bit int).

(You can find the current limit in /proc/sys/kernel/pid_max, but it may have been increased through sysctls.)

A few years ago I discovered that a Fedora package had raised this limit on me, which I was able to see because it turned out that my Fedora machines routinely go through a lot of PIDs. I reverted this by removing the package for various reasons, including that I don't really like gigantic process IDs (they bulk up the output of ps, top, and other similar tools). Then recently I updated to Fedora 31, and not too long afterward noticed that I was getting giant process IDs again (as I write this a new shell on one machine gets PID 4,085,915).

This turns out to be a deliberate choice in modern versions of systemd, instead of another stray package deciding it knows best. In Fedora 31 (with systemd 243), /usr/lib/sysctl.d/50-pid-max.conf says:

# Bump the numeric PID range to its maximum of 2^22
# (from the in-kernel default of 2^16), to make PID
# collisions less likely.
kernel.pid_max = 4194304

(Since the PID that new processes get is so close to the maximum, I suspect that I have actually rolled over even this large range a couple of times in the 21 days that this machine has been up since the last time I got around to a kernel update.)

Given that this is a new official systemd thing, I'm going to let it be and live with gigantic PIDs. It's not really worth fighting systemd; it generally doesn't end well for me.

(Hopefully there aren't any programs on the system that assume PIDs are small and always fit into five-character fields in ASCII. Or at least no programs that will fail when this assumption is incorrect, as opposed to producing ugly output.)

Fedora31GiantPids written at 00:52:15; Add Comment

2020-01-07

eBPF based tools are still a work in progress on common Linuxes

These days, it seems that a lot of people are talking about and praising eBPF and tools like BCC and bpftrace, and you can read about places such as Facebook and Netflix routinely using dozens of eBPF programs on systems in production. All of these are true, by and large; if you have the expertise and in the right environment, eBPF can do great things. Unfortunately, though, a number of things get in the way of more ordinary sysadmins being able to use eBPF and these eBPF tools for powerful things.

The first problem is that these tools (and especially good versions of these tools) are fairly recent, which means that they're not necessarily packaged in your Linux distribution. For instance, Ubuntu 18.04 doesn't package bpftrace or anything other than a pretty old version of BCC. You can add third party repositories to your Ubuntu system to (try to) fix this, but that comes with various sorts of maintenance problems and anyway a fair number of nice eBPF features also require somewhat modern kernels. Ubuntu LTS's standard server kernel doesn't necessarily qualify. The practical result is that eBPF is off the table for us until 20.04 or later, unless we have a serious enough problem that we get desperate.

(Certainly we're very unlikely to try to use eBPF on 18.04 for the kinds of routine monitoring and so on that Facebook, Netflix, and so on use it for.)

Even on distributions with recent packages, such as Fedora, you can run into issues where people working in the eBPF world assume you're in a very current environment. The Cloudflare ebpf_exporter (also) is a great way to get things like local disk latency histograms into Prometheus, but the current code base assumes you're using a version of BCC that was released only in October. That's a bit recent, even for Fedora.

(The ebpf_exporter does have pre-built release binaries available, so that's something.)

Then there's the fact that sometimes all of this is held together with unreliable glue because it's not really designed to all work together. Fedora has just updated the Fedora 31 to be a 5.4.x kernel, and now all BCC programs (including examples) fail to compile with a stream of reports about "error: expected '(' after 'asm'" being reported for various bits of the 5.4 kernel headers. Based on some Internet reading, this is apparently a sign of clang attempting to interpret inline assembly things that were written for gcc (which is what the Linux kernel is compiled with). Probably this will get fixed at some point, but for now Fedora people get to choose either 5.4 or BCC but not both.

(bpftrace still works on the Fedora 5.4 kernel, at least in light testing.)

Finally, there's the general problem (shared with DTrace on Solaris and Illumos) that a fair number of the things you might be interested in require hooking directly into the kernel code and the Linux kernel code famously can change all the time. My impression is that eBPF is slowly getting more stable tracepoints over time, but also that a lot of the time you're still directly attaching eBPF hooks to kernel functions.

In time, all of this will settle down. Both eBPF and the eBPF tools will stabilize, current enough versions of everything will be in all common Linux distributions, even the long term support versions, and the kernel will have stable tracepoints and so on that cover most of what you need. But that's not really the state of things today, and it probably won't be for at least a few years to come (and don't even ask about Red Hat Enterprise 7 and 8, which will be around for years to come in some places).

(This more or less elaborates on a tweet of mine.)

EBPFStillInProgress written at 00:03:38; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.