A realization about one bit of test-driven development

August 31, 2012

One of the standard pieces of instruction for TDD is that when you are about to do some coding you should not just write the tests before the code but you should also run the tests and see them fail before starting on the real code. You can find this cycle described in a lot of places; write test, run test, see the failure, write the code, run the test, see the test pass, feel good (monkey got a pellet, yay). Running tests that you knew were going to fail always struck me as stupidly robotic behavior, so even when I wrote tests before my code (eg, to try out my APIs) I skipped that step.

Recently I came to a realization about why this is actually a sensible thing to do (at least sometimes). The important thing about seeing your test fail first is it verifies that your code change is what made the test pass.

(This is partly a very basic check that your test is at least somewhat correct and partly a check on the code change itself.)

Sometimes this is a stupid thing to verify because it's already clear and obvious. If you're adding a new doWhatever() method, one that didn't exist before, and calling it from the test, then your code change is clearly responsible for the test succeeding (at least in pretty much any sane codebase; your mileage may vary if you have complex inheritance trees or magic metaprogramming that catches undefined methods and so on).

But not all changes are like that. Sometimes you're making a subtle change deep in the depths of existing code. This is where you most want to verify that the code is behaving as you expect even before you make your modification; in other words, that the tests you expect to fail and that should fail do indeed fail. Because if a test already passes even before your code change, you don't understand the existing code as well as you thought and it's not clear what your change actually does. Maybe it does nothing and is redundant; maybe it does something else entirely than what you thought (if you have good test coverage, it's at least nothing visibly damaging).

(Alternately, your test itself has a problem and isn't actually testing what you think it is.)

There's a spectrum between the two extremes, of course. I'm not sure where most of my code falls on it and I still don't like the robotic nature of routinely running tests that you expect to fail, but this realization has at least given me something to think about.


Comments on this page:

From 85.0.112.218 at 2012-09-01 17:12:43:

Yes, exactly.

This is extremely important when you are writing a test for a bug: first you write the test, then you run it to see that it fails, so that you are sure that the test actually exercises the bug, and then you fix the bug. Otherwise, the test you wrote to make sure that you will not introduce a regression may actually be testing something completely unrelated to the bug and thus leaving you unprotected from regressions. In fact, if the fix you then make actually fixes your bug, then that success costs you the opportunity to realise you are not protected by a regression test.

So when you are writing a regression test, no matter how robotic it feels, always run the test first, before making any attempt to fix the bug.

For TDD, I agree it often seems superfluous to run the test first. This is to some extent a trap… even though trying to avoid it means a large waste of time. Maybe the answer is a harness like this (followup).

Aristotle Pagaltzis

By cks at 2012-09-07 15:49:44:

I think that what most irritates me is not so much the time running the failed test takes as the interruption in my coding flow (although both are annoying in a 'this is stupid' way). When I know perfectly well what I'm going to do next, it's annoying not to do it for an artificial reason.

By cks at 2012-09-07 15:53:30:

(And now that I've read your links, I see that the person there had the same no-interruptions, no-pauses goal that I was talking about.)

From 85.0.112.218 at 2012-09-09 04:06:52:

Yes… by waste of time I meant the interruption to flow implicitly. It takes time to get back into the swing, and it takes that over and over. It’s not just the test run itself, but the whole interruption from having to stop what you are doing, switch windows, run the test, check the results, consider the reason for any failures, and finally pop the mental stack, and resume what you were doing. What a bother.

(As an aside, that is what I love about the Git index. It allows me to leave committing for after I’ve cleared my mental stack. Without it, making small clean semantic commits requires following a strict procedure irrespective of how it fits into flow.)

Aristotle Pagaltzis

I've been doing test-driven development for almost twenty years now, and I've learned that it's easiest by far to always, always make sure you've seen your test fail (and fail in the way you expect it to) before it passes. It's just way too easy otherwise for something to slip by.

It’s not just the test run itself, but the whole interruption from having to stop what you are doing, switch windows, run the test, check the results, consider the reason for any failures, and finally pop the mental stack, and resume what you were doing. What a bother.

I find this weird, because what you've described is not an "interruption" to the flow, it is the flow. There's nothing to "stack" normally because whatever the test is testing is what you're working on at the top of the stack already.

Switching keyboard focus back and forth between windows should be trivial in almost any desktop environment (Alt-Tab or whatever), but switching keyboard focus isn't even necessary if you use a "watch" program, a test runner that automatically runs tests when it sees files change, or just something in your editor that kicks off the tests and displays them somewhere convenient. This is really no different from writing in a language with a compiler you need to run.

If the issue is that you need to "switch" to uncover the other window so you can see its contents, yeah, you need to place your edit and build/test windows side-by-side and make sure you can see the contents of both by simply moving your gaze rather than having to push buttons.

There's nothing to “stack” normally because whatever the test is testing is what you're working on at the top of the stack already.

I can only speak for myself, though for myself I can speak with authority. I am often holding multiple change intents for the code in my head at once, seemingly inevitably, because parts of the code interrelate to each other, and when I’ve found myself having to change something here, I know this means I will have to also change something over there (and there, and there). But to do an isolated test of an already-made change, I have to set aside all of the change intents I’ve already accumulated, while not forgetting them, then possibly selectively undo some of the changes I’ve already made (e.g. git stash), write the test, check that it has the expected result – and then maybe start a whole other goose chase if not –, and then finally come back to make the other changes I have been holding in my head.

If your cognitive style doesn’t lead you to keeping edits in the air like this, then obviously you won’t experience bother from inserting test running into the proceedings.

Written on 31 August 2012.
« A small rant about looking down on Linux users
Solaris 11 is still closed source (and Oracle is careless with words) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Aug 31 22:32:47 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.