Why we haven't taken to DTrace

April 6, 2012

Recently I read Barriers to entry for DTrace adoption (via Twitter). As it happens I have an opinion on this, since we use Solaris and I have done a modest amount of things with DTrace. My belief is that DTrace has between two and three problems, depending on how you look at it.

(Part of our non-use of DTrace is that I once had a bad experience where starting to use DTrace on a production fileserver had immediate and significant bad effects. I've seen DTrace work okay since then but the uncertainty lingers, especially for writing my own DTrace scripts. But that's only a relatively modest part of it.)

First is that it's pretty hard to really use DTrace if you're not familiar with Solaris kernel internals. This issue takes some explanation (unless you've tried to use DTrace, in which case you're probably awfully familiar with it). What it boils down to is that there are really two DTraces, one for extracting subsystem information from the kernel and one for debugging the kernel, and the first one is incomplete.

In theory, DTrace lets you tap into all sorts of documented trace points that Solaris has put into the kernel, extracting a wide variety of interesting state from each of them (you can read the coverage of the various providers in the DTrace documentation). In practice, the Solaris kernel developers have never provided enough trace points with enough state information to be really useful by themselves. Instead they leave you to fall back on the 'kernel debugging' side of DTrace, where you can intercept and trace almost any function and extract random information from kernel memory provided that you know what you're looking for and what it means.

There are two problems with this (at least from my perspective). The first is that most of the really interesting uses of DTrace require using the kernel debugging DTrace and using the kernel debugging DTrace requires understanding the internals of the kernel. Ideally you need the code, which has always made things a little bit interesting (even before Solaris went closed source, OpenSolaris source did not exactly match Solaris (cf)). The second is that the DTrace documentation has never tried to address this split, instead throwing everything together in one big pile that (the last time I read it) was probably more oriented towards the person doing a deep dive into the kernel than a sysadmin trying to cleverly extract useful information from what trace points there are.

(One sign of the documentation quality is that there is a plethora of blog entries and web sites that try to explain clever DTrace tricks and how to use it to get interesting results. Personally I would like to see the documentation split into at least two parts, one for sysadmins and one for people debugging the kernel.)

Second (or third, depending on how you view the documentation problem) is that the DTrace scripting language has plenty of annoying awkwardness and pointless artificial limitations. These are situations where DTrace can do what you want but it forces you to jump through all sorts of hoops with no assistance; one example I've already mentioned is pulling information from user space. Many of these issues could be fixed with things like macros and other high level language features (or specific support for various higher level operations), but the DTrace authors seem to have deliberately chosen to keep much of the language at a low level. This is a virtue in a system language but DTrace isn't a system language, it's a way of specifying what information you want to extract from the system and when.

(One unkind way to put this is that the DTrace scripting language is mostly oriented around the needs of the people writing the kernel DTrace components instead of the people who are trying to use DTrace. It's easy to see how this happened but it doesn't make it right.)

These issues don't make DTrace impossible to use, and as a demonstration of that lots of people have written lots of very interesting and useful DTrace scripts. But they do significantly raise the barriers to entry for using DTrace; for most serious and interesting uses, you have to be prepared to learn kernel internals and slog through a certain amount of annoyance and make-work. It should not be any surprise that plenty of people haven't had problems that are sufficiently urgent and intractable to cause them to do this.

(It is not just that this stuff has to be learned. It's also that the learning simply takes time, probably significant time, and many people may not have that much time if they're dealing with a non-urgent problem.)

Written on 06 April 2012.
« Why I hate having /tmp as a tmpfs
Checksums and hashes »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Apr 6 03:08:17 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.