I don't understand how to test complex data structures
One of my weaknesses as a modern programmer is that I don't really understand how to do test driven development. I understand the basic ideas of automated testing and unit testing and so on, and have used them to reasonable effect every so often; where I fall down is understanding how to test things in more advanced and challenging circumstances.
My current example of this is our ZFS spares handling system. Simplified slightly, the core program works by reading state information on all of the ZFS pools and their components (some of which may be incomplete), making multiple passes over this state information to refine it and generate a collection of higher level information, and then using all of this to detect what problems exist and decide what should be done about them. Because of how ZFS organizes its configuration information, the ZFS pool data structures wind up being multi-level and relatively large and complex (ZFS is in love with things nested inside things nested inside things). Because spares replacement is a global thing, the decisions the spares system makes are based on the entire system state, not on small local attributes of one particular bit of these data structures.
Doing proper test based development of this code certainly seems to require somehow manufacturing an entire artificially damaged set of pool configurations, ideally ones that accurately reflect our production fileservers. I don't know how I am supposed to do this in a TDD world, and I don't see any particularly good way to do it.
There are two vaguely plausible approaches. First, I could try to write the base state information from scratch. The problem is that state information is very large; even for a relatively small production fileserver it's over 500 attributes (some nested), and a full scale production fileserver that's experiencing problems will have well over a thousand attributes. Hand writing configurations of this size is sufficiently time consuming and tedious (and likely error-prone) that I am simply not going to get good situation coverage.
Alternately I could start with real state information for a working system and then selectively and automatically break it in various ways, so that it looked like disks had failed, other disks were in the process of being replaced with spares, and so on. The problem is that such modifications to the state information are themselves relatively complex once you get beyond simple situations. I would have to write an entire chunk of code to carefully mutate these data structures, including adding entirely new synthetic nesting elements that were created from scratch. This has much the same problems as complex mock objects; how do I know that my mutation code is correct?
(One plausible answer from testing people is that I should not have passive attribute-based data structures but instead hierarchies of objects with complex behaviors, and then I should substitute mock objects to represent broken objects. One of the many issues with this is that it proceeds straight to the complex mock objects issue.)
Presumably test driven development has an answer to this problem. I just don't know enough about how to do this to know what it is.
Sidebar: what I do right now
Right now, I test by hand by resorting to ad hoc manual techniques such as temporarily adding code to deliberately make specific bits go wrong. This has the advantage that I can make bits of the program lie to itself, but also all sorts of disadvantages and limitations; I can't automate it, I can't test truly complex things, my testing is necessarily somewhat indirect and artificial, and so on.
Oh, and I have to take the test code out before pushing updates to production (then put it back in the next time I have something to add or debug).