2014-08-22
Where DTrace aggregates are handled for printing et al
DTrace, the system, is split into a kernel component and a user
level component (the most obvious piece of which is the dtrace
command). However the DTrace documentation has very little discussion
of what features are handled where. You might reasonably ask why
we care; the answer is that anything done at user level can easily
be made more sophisticated while things done at kernel level must
be minimal and carefully safe. Which brings us around to DTrace
aggregates.
For a long time I've believed that DTrace
aggregates had to be mostly manipulated at the user level. The
sensible design was for the kernel to ship most or all of the
aggregate to user level with memcpy()
into a buffer that user
space had set up, then let user level handle, for example, printa()
.
However I haven't known for sure. Well, now I do. DTrace aggregate
normalization and printing is handled at user level.
This means that D (the DTrace language) could have a lot of very useful features if it wanted to. The obvious one is that you could set the sort order for aggregates on a per-aggregate basis. With a bit more work DTrace could support, say, multi-aggregate aware truncation (dealing with one of the issues mentioned back here). If we go further, there's nothing preventing D from allowing much more sophisticated access to aggregates (including explicit key lookup in them for printing things and so on), something that would really come in handy in any number of situations.
(I don't expect this to ever happen for reasons beyond the scope of this entry. I expect that the official answer is 'D is low level, if you need sophisticated processing just dump output and postprocess in a script'. One of the reasons that this is a bad idea is that it puts a very sharp cliff in your way at a certain point in D sophistication. Another reason is that it invites you to play in the Turing tarpit of D.)
Sidebar: today's Turing Tarpit D moment
This is a simplified version.
syscall::read:return, syscall::write:return / ... / { this->dirmarker = (probefunc == "read") ? 0 : 1; this->dir = this->dirmarker == 0 ? "r" : "w"; @fds[this->dir, self->fd] = avg(self->fd * 10000 + this->dirmarker); .... } tick-10sec { normalize(@fds, 10000); printa("fd %@2d%s: ....\", @fds, @....); }
If a given file descriptor had both read and write IO, I wanted the
read version to always come before the write version instead of
potentially flip-flopping back and forth randomly. So I artificially
inflate the fd number, add in a little discriminant in the low
digits to make it sort right, and then normalize away the inflation
afterwards. I have to normalize away the inflation because the value
of the aggregation has to be used in printa()
, which means that
the actual FD number has to come from that and not its part of the
key tuple.
Let me be clear here: this may be clever, but it's clearly a Turing tarpit. I've spent quite a lot of time figuring out how to abuse D features in order to avoid the extra pain of a post-processing script and I'm far from convinced that this actually was a good use of my time once the dust settled.