Where DTrace aggregates are handled for printing et al

August 22, 2014

DTrace, the system, is split into a kernel component and a user level component (the most obvious piece of which is the dtrace command). However the DTrace documentation has very little discussion of what features are handled where. You might reasonably ask why we care; the answer is that anything done at user level can easily be made more sophisticated while things done at kernel level must be minimal and carefully safe. Which brings us around to DTrace aggregates.

For a long time I've believed that DTrace aggregates had to be mostly manipulated at the user level. The sensible design was for the kernel to ship most or all of the aggregate to user level with memcpy() into a buffer that user space had set up, then let user level handle, for example, printa(). However I haven't known for sure. Well, now I do. DTrace aggregate normalization and printing is handled at user level.

This means that D (the DTrace language) could have a lot of very useful features if it wanted to. The obvious one is that you could set the sort order for aggregates on a per-aggregate basis. With a bit more work DTrace could support, say, multi-aggregate aware truncation (dealing with one of the issues mentioned back here). If we go further, there's nothing preventing D from allowing much more sophisticated access to aggregates (including explicit key lookup in them for printing things and so on), something that would really come in handy in any number of situations.

(I don't expect this to ever happen for reasons beyond the scope of this entry. I expect that the official answer is 'D is low level, if you need sophisticated processing just dump output and postprocess in a script'. One of the reasons that this is a bad idea is that it puts a very sharp cliff in your way at a certain point in D sophistication. Another reason is that it invites you to play in the Turing tarpit of D.)

Sidebar: today's Turing Tarpit D moment

This is a simplified version.

syscall::read:return, syscall::write:return
/ ... /
{
   this->dirmarker = (probefunc == "read") ? 0 : 1;
   this->dir = this->dirmarker == 0 ? "r" : "w";
   @fds[this->dir, self->fd] = avg(self->fd * 10000 + this->dirmarker);
   ....
}

tick-10sec
{
   normalize(@fds, 10000);
   printa("fd %@2d%s: ....\", @fds, @....);
}

If a given file descriptor had both read and write IO, I wanted the read version to always come before the write version instead of potentially flip-flopping back and forth randomly. So I artificially inflate the fd number, add in a little discriminant in the low digits to make it sort right, and then normalize away the inflation afterwards. I have to normalize away the inflation because the value of the aggregation has to be used in printa(), which means that the actual FD number has to come from that and not its part of the key tuple.

Let me be clear here: this may be clever, but it's clearly a Turing tarpit. I've spent quite a lot of time figuring out how to abuse D features in order to avoid the extra pain of a post-processing script and I'm far from convinced that this actually was a good use of my time once the dust settled.

Written on 22 August 2014.
« How data flows around on the client during an Amanda backup
Some notes on Python packaging stuff that wasn't obvious to me »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Aug 22 01:10:04 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.