Wandering Thoughts archives

2012-10-02

Some notes on getting durations in DTrace

One of the things you commonly wind up doing in DTrace is computing durations, how long things take; this gives you latencies, among other useful things. Since we have a significant performance problem with our mail spool right now, durations and latencies have very much been my focus recently. In the process of trying to apply DTrace to the problem I've wound up with some things that I want to remember about this.

What I've seen as the standard or classic DTrace pattern for getting durations is to timestamp the start of an event and then calculate the duration at the end of it:

fbt::fop_read:entry
{ self->ts = timestamp; }

fbt::fop_read:return
{ @read[probefunc] = quantize((timestamp - self->ts) / 1000000); }

(The timestamp DTrace variable is in nanoseconds, so here we scale it to milliseconds. Also I've become convinced that you never want to use the anonymous '@[...]' aggregate if you're writing an actual D script, even if you have only one aggregate.)

This works (in this case), but there's some things that can be better. First, you generally want to make the clause that calculates the duration conditional on the initial timestamp being set, especially if the start and the end points are different functions or probe spots (the usual symptom of failing to do this is a few extremely large values that result from the calculation being 'timestamp - 0'). You also want to explicitly clear the timestamp after calculating the duration; this can help DTrace use less memory and avoids accidents (again, especially if the start and the end point are different). So:

fbt::fop_read:return
/ self->ts > 0 /
{  @[probefunc] = quantize(....);
   self->ts = 0; }

Sometimes you only want to track some events, not all of them. For example, you might only be interested in low-level ZFS IO stats for read IO:

fbt::zio_create:return
/ args[1]->io_type == 1 /
{ .... }

Once you've made the start conditional, you don't need to also check the condition at the end point; you can simply be conditional on the timestamp being set. If the timestamp is set you know that you passed the condition at the start. If you do have additional conditions on the end point, you should make sure to unconditionally clear the timestamp. Since DTrace doesn't have an if or the equivalent, you get to do this the awkward way with a separate clause:

fbt::fop_read:return
/ self->ts > 0 && (timestamp - self->ts) > 250000 /
{ .... }

fbt::fop_read:return
{ self->ts = 0; }

Don't flip the order of these two. DTrace processes clauses in the order you've written them, so if you had them the other way around you'd clear the start timestamp before calculating anything with it.

(On a side note, it would be really great if quantize() and lquantize() could optionally ignore all values outside a specified range.)

In the simple case you can store the start timestamp (or timestamps) on the special self thread-local variable. However I believe that this is only useful for synchronous operations. If you want to track the latency of something that involves asynchronous operations, callbacks, and similar things you need to find a unique key and then use an associative array to store the start timestamp. So:

fbt::zio_create:return
/ args[1]->io_type /
{ ziots[args[1]] = timestamp; }

fbt::zio_done:entry
/ ziots[args[0]] > 0 /
{ ....;
  ziots[args[0]] = 0; }

If you're doing this it's especially important to clear the timestamp by assigning zero to it. Not only do you avoid accidents if the same memory address (or other key value) is later reused for another operation (which is actually quite likely in some cases, especially for the addresses of structures), but it's also the only way that now-finished things will get cleaned out of the ziots associative array.

(I'm sure that all of these are well known in the active DTrace community and probably are documented in some best practices or common tricks pages. I just haven't been sufficiently exposed to the DTrace world to have found this stuff yet.)

Sidebar: why structure address reuse is quite likely

Solaris does most of its allocation through slab allocation, generally with a structure-specific slab for any commonly used sort of object. It also has caches for many sorts of objects that are allocated and deallocated frequently enough to justify it. Thus, if you're tracking any sort of commonly used object it's very likely that when a specific instance is freed it'll be reused in short order. Either it will be near the top of the applicable 'recently freed' cache or it will get put back in its slab pool and then shortly allocated from the pool again.

solaris/DTraceDurationNotes written at 01:17:20; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.