Our DTrace scripts for NFS server, ZFS, and iSCSI initiator monitoring

October 31, 2012

As a result of recent events, I've built up a collection of DTrace scripts for monitoring and reporting on our fileserver environment, where we use NFS v3 on top of ZFS on top of iSCSI. Since I grumbled earlier about the lack of easily findable DTrace scripts for this, I've made our scripts public on Github as siebenmann/cks-dtrace (where you can read more details about what's there). They're written for Solaris 10 update 8 (plus some patches) and do play around with kernel data structures.

These scripts are somewhat specific to our environment and contain various local assumptions (some of them commented). They're also not the best DTrace code possible, and in fact they contain several generations of my DTrace code as I steadily learned more about what I was doing (if I was very keen, I would go back to rewrite the older code in the current best style I know).

In addition to their straightforward use, these scripts may serve as a useful example of both how to do various things with DTrace and how to extract various bits of information from the Solaris kernel. In an ideal world there would be a DTrace wiki with information of the form 'given an X, here's how you get a Y' (such as 'given a vnode, here's how you get its ZFS pool'), but as far as I know right now you have to find various little tricks in various people's DTrace scripts.

(I'd be overjoyed to be wrong about this.)

In the 'giving proper credit' department: I didn't come up with these scripts in a void, using only my ingenuity; instead, I stand on the shoulders of many giants. I would not have gotten anywhere near as far I have without taking all sorts of DTrace tricks and clever ways of extracting various bits of kernel information from other people's DTrace scripts, often ZFS-related scripts. Useful references that I have in my notes include Brendan Gregg (and various scripts in the DTrace book's chapter 5) and Richard Elling's zilstat.

(And as always, all of these scripts would have been basically impossible without the OpenSolaris kernel code. That lack of kernel source code cripples DTrace is one reason I remain quite angry with Oracle's decision to close Solaris 11 source. The more I use DTrace, the more convinced I am that we'll never move to Solaris 11 (unless Oracle has a total change of heart).)


Comments on this page:

From 67.188.160.90 at 2012-11-01 02:26:20:

Thanks for sharing! These are especially useful -- not just for direct usage -- but as an expression of useful ideas. Even if they don't run for other people, they show what can be done (after script updates to match kernel versions).

...

Some minor comments about comments: I'd add your name and blog post or github URL to the script headers - ZFS DTrace scripts are very popular, and I'm guessing these will get emailed around, so that should help people find the origin (and updates!). The level of comments is otherwise great - especially those explaining the reason for tracing zio functions. Note that one script has the "/* DTrace 1.0: no inet functions, no this->strings */" comment, but that comment was missed in others, and it is a pretty alarming piece of code worthy of a comment. :)

Does iscsi-long.d clear started[] if it isn't long? This looks like a very useful script BTW.

- Brendan

By cks at 2012-11-01 12:13:09:

Thank you for the comments, suggestions, and especially the bug report about iscsi-long.d; you're right, it didn't clear started[] if the iSCSI request wasn't a long one. I've fixed that and all of the other issues (I believe) and the updated versions are now on Github.

From 209.118.197.222 at 2012-12-28 19:16:12:

Monitoring similar stuff. Here is a script that monitors latency, throughput and I/O request size of nfs,zfs and io broken down into reads and writes and writes into non-sync, sync and filesync and nfsreads broken onto cached reads and non-cached reads. Further nfs requests can be broken down into latency histograms and further into latency histograms by size. Runs by default every second but that can be modified with a command line arg.

https://github.com/khailey/ioh/blob/master/README.md

can optionally take and IP arg if wanting to look at traffic to from a particular IP.

Simplest output looks like:

date: 1335282287 , 24/3/2012 15:44:47
TCP out:  8.107 MB/s, in:  5.239 MB/s, retrans:        MB/s  ip discards:
----------------
            |       MB/s|    avg_ms| avg_sz_kb|     count
------------|-----------|----------|----------|--------------------
R |      io:|     0.005 |    24.01 |    4.899 |        1
R |     zfs:|     7.916 |     0.05 |    7.947 |     1020
C |   nfs_c:|           |          |          |        .
R |     nfs:|     7.916 |     0.09 |    8.017 |     1011
- 
W |      io:|     9.921 |    11.26 |   32.562 |      312
W | zfssync:|     5.246 |    19.81 |   11.405 |      471
W |     zfs:|     0.001 |     0.05 |    0.199 |        3
W |     nfs:|           |          |          |        .
W |nfssyncD:|     5.215 |    19.94 |   11.410 |      468
W |nfssyncF:|     0.031 |    11.48 |   16.000 |        2

I haven't had a chance to look at your scripts yet. Skimmed the nfs one and bookmarked the github project to look into soon.

- Kyle Hailey

http://dboptimizer.comm

Written on 31 October 2012.
« Some stats and notes on relay attempts for our external mail gateway
Why our ZFS fileservers sometimes boot slowly »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Oct 31 22:59:57 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.