Some thoughts on how I still miss DTrace (and also mdb)

November 29, 2020

Although I'm generally happy with our Linux fileservers, every so often we run into an issue where I miss OmniOS's DTrace and mdb; DTrace for dynamic visibility into what the system was doing, and mdb for static inspection and tracing through kernel data structures. In theory Linux has equivalents of both of these. In practice this Linux future is unevenly distributed. It's likely that our Linux fileservers will have great visibility once we upgrade them to Ubuntu 22.04 in 2022 or 2023, but it's taking some time to get there. This is in stark contrast to Solaris, where DTrace (and mdb) were usable from the very beginnings of our ZFS fileservers, very shortly after Solaris 10 included DTrace at all.

It's my feeling that this difference is ultimately because of how Solaris was developed as a unitary whole by Sun, a single organization, in contrast to how Linux's performance and observability systems have been developed in many separated pieces by different groups. Since there was a single organization controlling all Solaris development, people within Sun were in a position to set overriding priorities and decide that DTrace was a good enough idea that it would be finished and shipped all at once in a ready to go state. Since Solaris was a unitary system, the kernel and user tools could ship together as a coordinated thing and the same people could develop both together.

(Similar things apply for mdb, which needs to move in step with the kernel.)

I do feel that the Linux approach has had some important advantages, but it's undeniable that it's been slower to produce actual results (and those results are unevenly distributed, depending on how up to date your version of Linux is and how rapidly it incorporated various things). My perception is that there's been a back and forth where people put forward needs and prototypes, the kernel adds facilities, people build tools that use and push those facilities, and the usefulness of these tools pushes the kernel forward. Some of these tools have wound up being superseded or abandoned, and the overall selection of tools (and facilities) isn't unified. Since the overall Linux system is not a unified thing under the control of one organization, no one in Linux could write an engineering white paper, build a prototype, or otherwise line up everyone behind one thing to get it done rapidly and out into the world all at once.

In a way, this is what I miss from DTrace and mdb. Solaris could move decisively, for better or worse (and to some extent Illumos still can). Not all of its decisive moves were wins (ask me what I feel about SMF), but there was a real power in its ability to make them, and DTrace is a great illustration of that.

PS: It feels possible that not making a public, supported interface between dtrace and the kernel was one of the things that allowed DTrace as a whole to be shipped early, since it reduced the number of public things that had to be carefully designed and tested. This is another area where having a unitary system helps in several ways.

Written on 29 November 2020.
« The death and life of postmaster@anywhere
Link: wtfpython »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Nov 29 23:52:33 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.