Wandering Thoughts archives

2023-06-01

Capturing data you need later when using bpftrace

When using bpftrace, it's pretty common that not all of the data you want to report on is available in one spot, at least when you have to trace kernel functions instead of tracepoints. When this comes up, there is a common pattern that you can use to temporarily capture the data for later use. To summarize this pattern, it's to save the information in an associative array that's indexed by the thread id to create a per-thread variable. If you have more than one piece of information to save, you use more than one associative array.

Let's start with the simplest case; let's suppose that you need both a function's argument (available when it's entered) and its return value (so you can report only on successful functions). Then the pattern looks like this:

kprobe:afunction
{
  // record argument into @arg0
  // under our thread id (tid)
  @arg0[tid] = (struct something *)arg0;
}

// only act if we have the argument
// recorded
kretprobe:afunction
/@arg0[tid] != 0/
{
  $arg = @arg0[tid];
  printf(...., $arg) // or whatever

  // clean up recorded argument
  delete(@arg0[tid]);
}

This example shows all of the common pieces. At the start, we capture the function argument we care about into an associative array that's indexed by the current thread ID (using the tid builtin variable), then, provided that we have a recorded argument we use it when the function returns. At the end, we clean up our associative array by deleting our entry from it; if we didn't do this, we might have an ever-growing associative array (or arrays) as different threads called the function we're tracing. Incidentally, one time we might invoke the kretprobe probe without the argument recorded is if we start tracing while an existing invocation of the function is in flight (which may be especially likely for functions that take a while, such as handling a NFS request and reply).

(This pattern is so common it's mentioned in the documentation as a per-thread variable. Note that the documentation's example delete()s the per-thread entry just as I do here.)

The reason we didn't use a simple global variable, as I did when I was recording ZFS's idea of available memory (in another bpftrace trick) is that multiple threads may be calling this function at the same time, and if they are, using a single global variable is obviously going to give us bad results.

Another case that often comes up is that the function we want to trace directly or indirectly calls another function that looks up important information, for example to map some opaque identifier into a more useful piece of data (a string, a structure) and return it. A variant of this is where the function will generate the information we want through a process that we can't hook into, but will then call another function to validate it or act on it, at which point we can grab the data. The full version of this pattern looks something like this:

// set a marker so we know to save info
kprobe:afunction
{
  @aflag[tid] = 1;
}

// if we're marked, save the information
kprobe:subfunction
/@aflag[tid] != 0/
{
  @magicarg[tid] = arg0;
}

// if we have saved information, use it
// and clear it
kretprobe:afunction
/@magicarg[tid] != 0/
{
  .... do whatever ...
  delete(@magicarg[tid]);
}

// clear the marker
kretprobe:afunction
/@aflag[tid] != 0/
{
  delete(@aflag[tid]);
}

One reason we need to set a marker and only save the subfunction's information if we're marked is that the marker is our guarantee that the saved information will be cleared later. If we unconditionally saved the information when subfunction() was called but only cleared it when subfunction() was called by afunction(), that would lead to a slow growth of dead @magicarg entries if subfunction() is ever called from anywhere else.

A variant on this is if our 'subfunction' is actually a peer function to our function of interest (and gets called before it), with both being called from a containing function. The pattern here is more elaborate; the containing function sets the marker and must clean up everything, with the subfunction and our function saving and using the information.

Sidebar: Tracking currently active requests/etc in bpftrace

In DTrace, the traditional way to keep a running count of something (such as how many threads were active inside afunction()) was to use a map with a fixed key that was incremented with sum(1) and decremented with sum(-1) (see map functions), with the decrement generally guarded so that you knew a matching increment had been done. Although I haven't tested it, the bpftrace documentation on the ++ and -- operators seems to imply that these are safe to use on at least maps with keys (including constant keys), and perhaps global variables in general. Even if you have to use maps, this is at least clearer than the sum() version.

(You'll want to guard the decrement even if you use --.)

linux/BpftraceStashingData written at 23:06:34; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.