Thinking about uses for (system) activity tracers
System activity tracers are a hot topic, with the best known one being Sun's DTrace. In thinking about this issue recently, I believe that there are three sorts of questions that they can be used to answer, or at least that I'm interested in having answered:
- what is my system doing?
Performance related tracing is one obvious subset of this, both in the 'what is taking all the time' sense and in the 'how long does some operation take' sense.
- why is my system doing X, in the sense of 'what is doing X on my
Here you have some peculiar thing happening on your system and you want to trace it back to the program or system or action that causes it. For example, laptop people are interested in questions like 'what is accessing my hard drive' and 'what is waking up all the time'.
- why is some part of my system doing what it is, or at least what information is it using to make the decisions about what it does?
The latter is important for solving specific problems; often you know roughly what is going wrong and what program is responsible, but you don't know why and how it is going wrong because you can't see the program's decision making process or even the information it is getting to make the decision. For example, consider 'I can't NFS-mount a filesystem that I think I should be able to'.
In theory you could deal with this by having programs optionally log a lot of information. My personal feeling (partly from having dealt with programs that did copious logging if asked) is that it is better to have a single central interface for deciding what you want to watch and log than to try to give every program options to control all of this; it just scales better, and it's probably easier for program authors too (since they just have to make some hooks available, instead of building a dynamically reconfigurable debug logging system).