Here is one of the best things you can do to improve your ability to find and fix problems:
Over and over I have seen statistics be a big part of solving problems. Not infrequently they are a big part in identifying the problems; sometimes having a reliable way of telling whether the problem is happening or not is half the battle.
(There is a world of difference between 'sometimes the NFS fileservers are slow' and 'sometimes the new RAID controller has average IO service times of over a second'.)
You shouldn't just get statistics when you're having problems; you should get them all the time and then keep them as long as possible (ideally forever). Having long term statistics is an even more powerful weapon, both because it lets you answer more questions and because it lets you compare what is different between the past (when you didn't have the problem) and today (when you do).
Unfortunately all of this is easier said that done, as very few systems come with good setups for gathering and keeping this stuff. In many places, building and maintaining the necessary tools just can't compete with other, more immediately urgent priorities.
(I confess that ours is one of those places.)
Sidebar: don't archive summaries
It is tempting to periodically summarize your stats and then archive only the summaries. While this is better than nothing, you really want to archive the full information. If you only keep summaries, you are counting on only needing the information in the summaries; in other words, you are assuming that you already know what information you will need to know to fix future problems.
This is not an assumption that I would make, to put it one way.