DTrace: two mistakes you can make in DTrace scripts

October 14, 2012

There are certain mistakes that you can make when writing DTrace scripts that are worth pointing out, partly because I've made them myself and partly because I've seen them in other people's scripts on the net. To serve as an example of this, suppose that you want to track ZFS read and write volume, so you write the following:

fbt::zfs_read:entry
{ @zreads = sum(args[1]->uio_resid); }

fbt::zfs_write:entry
{ @zwrites = sum(args[1]->uio_resid); }

This is perfectly good DTrace code and it works fine most of the time, but it has two significant issues.

The first is what I'll call the initiated versus completed problem. Because we are tracing on entry to zfs_read and zfs_write, we're counting IO activity when it's initiated, not when it's performed (whatever that means) or when it's completed. If IO completes rapidly and we're using a long time base, this doesn't matter but if either are potentially false this may give you misleading or confusing numbers, ones that do not match up to the reality as experienced by applications.

When to account for activity is in general a hard problem with no clear generic answers; how you want to do it can depend very much on what questions you want answers to. But my default assumption is to account for things on completion, especially for IO, because that's when a user-level program can go on; it's also clearly counting the past instead of anticipating the future. You may have different answers, but you definitely want to think about the issue. And if you pick 'counted when initiated', you probably should write some sort of comment about why this makes sense in your particular case.

The second problem is that this blithely ignores the possibility of errors of all sorts. Both zfs_read and zfs_write can be asked to do impossible things and can run into problems; in both cases, some or all of the IO will not get done. Accounting for things when you enter a routine necessarily means ignoring errors, but you can make this mistake even when you trace returns (I speak from personal experience here; fortunately it gave me crazy numbers and some other problems until I realized what was going on).

(Note that you don't need to run into actual errors to get incorrect results with reading, because short reads can easily happen; consider issuing a large read when you're almost at the end of a file.)

Unfortunately, knowing what sort of errors are possible and how to compensate for them generally takes reading (Open)Solaris kernel source and probably some experimentation. I have to admit that right now I don't know how to do fully correct IO size accounting for these routines, for example, because I'm not sure how to get the actual amount read or written for short reads and writes. It may be that these are uncommon enough that you can ignore them and just handle clear errors.

(One of the pernicious issues with both of these mistakes is that much of the time they won't matter. If your IO almost never results in errors or short IO, completes rapidly, and you're only looking at IO volume over a long time, you're not going to be affected. I'm lucky (in a sense) that I've been dealing with a number of situations where all of this was not true so I've been smacked in the nose by this.)

Written on 14 October 2012.
« DTrace: counting, aggregates, and a nice undocumented printa() feature
The anatomy of a performance problem with our mail spool »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Oct 14 02:16:58 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.