Why DTrace does not attract people to Solaris very often

November 16, 2012

Back when Solaris 10 was fresh and new, DTrace was supposed to be one of its significant competitive advantages, one of the major things that made it attractive to system administrators (and organizations). In practice Solaris 10 got a tepid reception, DTrace included, despite all of the banging on drums that supporters of Solaris did (and somewhat to their disbelief, at least in my impression of things). Having now used DTrace somewhat, I have an opinion on why.

It's pretty simple:

DTrace is not really a tool that sysadmins use to fix their own problems, it's a mechanism that your support provider and/or OS vendor uses to provide support.

(This idea was first expressed by a commentator on my entry on why we haven't taken to DTrace and has probably been percolating in the back of my mind ever since.)

You can certainly use DTrace to write your own diagnostic tools; after all, we did. But DTrace's almost complete lack of useful documented tracepoints means that doing this basically requires that you can read and understand (Open)Solaris code. Or, to put it another way, you need a system programmer. And system programmers are now uncommon; most places do not have one around and so will never write their own DTrace scripts (beyond at most very simple ones).

As for DTrace as a mechanism to help your OS vendor, well, sysadmins and organizations don't care how OS vendors provide support, just that they do (or don't). A mechanism to provide diagnostic tools is not a selling feature; at best, the diagnostic tools themselves are.

(Also, as I've sort of written before, most sysadmins don't expect to run into problems.)

The less pleasant way to put this is that people are not attracted to technologies, they are attracted to solutions to problems. DTrace is a fine technology but for most places it is not a solution.

(I don't know what Sun's true vision for DTrace was. If they planned for it to be a serious competitive advantage as opposed to mostly an internal tool, I think that they dropped the ball badly.)

Comments on this page:

From at 2012-11-16 05:54:57:

I only partly agree - most sysadmins these days do not even know what a syscall is and they are unlikely to be able to use DTrace directly. But there are still lots of sysadmins who do have required skills to use it and they are not necessarily kernel developers. I've been using DTrace since the very beginning and have been able to fixed lots of issues thanks to it, sometimes with spectacular outcomes. And I'm not an OS developer.

I also don't agree one has to understand kernel code or be able to read it to use DTrace - at least it depends on what kind of problems one is trying to solve. Obviously if a problem is in a kernel driver, etc. then it is unlikely it could be solved or even identified without elementary understanding of C and kernels... but then I've been able to root-cause and fix lots of issue with applications and the way they interact with OS without having to look into how kernel works or checking Solaris code. And in many cases, these problems would be very, very hard (and time consuming) to identify without DTrace.

From at 2013-01-03 03:55:19:


Thanks again - the "why not" issue for DTrace continues to be perplexing, so opinions like this shed light.

In your case, I think you ran out of runway with the stable, documented providers, and had to use the unstable and undocumented fbt provider to answer the questions you had. Using fbt well usually requires reading kernel source code, which isn't something most sysadmins of today can be expected to do (not system engineers). It's certainly a task that can be served by vendor support, similar to providing kernel patches.

In terms of difficulty, using the DTrace providers is a little like:

  • fbt provider: writing a simple kernel patch
  • stable providers: writing a shell script

Sysadmins should be able to handle stable providers (eg, io, proc, sched, vminfo). They are documented - you don't need to reach for kernel code. Programming them may be no more difficult than shell scripting.

It's fbt scripting, like writing a simple kernel patch, that may not be for everyone. It's great if you have the staff who can, and it's great that it's possible at all, but there aren't many out there who can (system engineers). I'm hoping we can train more, and increase the pool of DTrace skilled people out there (I've started running classes again). It should be a skill that all performance engineers, at least, have. In cases where you can't hire or train, you can still ask your support vendor to provide the scripts, like they provide kernel patches.

What about basic observability, like iosnoop, execsnoop, opensnoop, etc (DTraceToolkit). These aren't any more complicated (system engineering-y) than truss or snoop. But you don't want to open a support ticket to run snoop, nor do you want to have to write it yourself - I think you'd want basic tools like that provided with the OS, along with man pages and docs. Is this another practical avenue for sysadmins to use DTrace, but without needing to program DTrace themselves?

I think this may be what you are saying with "A mechanism to provide diagnostic tools is not a selling feature; at best, the diagnostic tools themselves are."

Next time I give a DTrace talk, should I teach how to develop DTrace scripts, or, how to use my DTrace scripts? :-)

- Brendan

Written on 16 November 2012.
« A learning experience: internal mail flow should never be allowed to bounce
Why Google's handling of multiple domains on inbound messages is okay »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Nov 16 01:33:56 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.