Wandering Thoughts archives

2020-01-07

eBPF based tools are still a work in progress on common Linuxes

These days, it seems that a lot of people are talking about and praising eBPF and tools like BCC and bpftrace, and you can read about places such as Facebook and Netflix routinely using dozens of eBPF programs on systems in production. All of these are true, by and large; if you have the expertise and in the right environment, eBPF can do great things. Unfortunately, though, a number of things get in the way of more ordinary sysadmins being able to use eBPF and these eBPF tools for powerful things.

The first problem is that these tools (and especially good versions of these tools) are fairly recent, which means that they're not necessarily packaged in your Linux distribution. For instance, Ubuntu 18.04 doesn't package bpftrace or anything other than a pretty old version of BCC. You can add third party repositories to your Ubuntu system to (try to) fix this, but that comes with various sorts of maintenance problems and anyway a fair number of nice eBPF features also require somewhat modern kernels. Ubuntu LTS's standard server kernel doesn't necessarily qualify. The practical result is that eBPF is off the table for us until 20.04 or later, unless we have a serious enough problem that we get desperate.

(Certainly we're very unlikely to try to use eBPF on 18.04 for the kinds of routine monitoring and so on that Facebook, Netflix, and so on use it for.)

Even on distributions with recent packages, such as Fedora, you can run into issues where people working in the eBPF world assume you're in a very current environment. The Cloudflare ebpf_exporter (also) is a great way to get things like local disk latency histograms into Prometheus, but the current code base assumes you're using a version of BCC that was released only in October. That's a bit recent, even for Fedora.

(The ebpf_exporter does have pre-built release binaries available, so that's something.)

Then there's the fact that sometimes all of this is held together with unreliable glue because it's not really designed to all work together. Fedora has just updated the Fedora 31 to be a 5.4.x kernel, and now all BCC programs (including examples) fail to compile with a stream of reports about "error: expected '(' after 'asm'" being reported for various bits of the 5.4 kernel headers. Based on some Internet reading, this is apparently a sign of clang attempting to interpret inline assembly things that were written for gcc (which is what the Linux kernel is compiled with). Probably this will get fixed at some point, but for now Fedora people get to choose either 5.4 or BCC but not both.

(bpftrace still works on the Fedora 5.4 kernel, at least in light testing.)

Finally, there's the general problem (shared with DTrace on Solaris and Illumos) that a fair number of the things you might be interested in require hooking directly into the kernel code and the Linux kernel code famously can change all the time. My impression is that eBPF is slowly getting more stable tracepoints over time, but also that a lot of the time you're still directly attaching eBPF hooks to kernel functions.

In time, all of this will settle down. Both eBPF and the eBPF tools will stabilize, current enough versions of everything will be in all common Linux distributions, even the long term support versions, and the kernel will have stable tracepoints and so on that cover most of what you need. But that's not really the state of things today, and it probably won't be for at least a few years to come (and don't even ask about Red Hat Enterprise 7 and 8, which will be around for years to come in some places).

(This more or less elaborates on a tweet of mine.)

linux/EBPFStillInProgress written at 00:03:38; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.