## Unit testing by analogy to scientific hypotheses

September 30, 2011

In the popular and currently dominant view of how to consider whether something is a proper scientific hypothesis, an important criteria is falsifiability. To simplify a great deal, you test a scientific hypothesis not just by looking for what it says should be there but also by looking for what it says should not be there. If the hypothesis is 'all swans are white' you don't just look for white swans, you also look for ones that are not white.

Let us consider a theoretical function that returns True if a number is a prime (and False if it is not). We need to write a test for this function, so we fire up an editor:

```def testPrimeness():
for i in 2, 3, 5, 7, 883:
mustBeTrue(isprime(i))
```

We're done, right? (Ignoring that this is only a short list of primes.)

No, not at all. What we've done is the testing equivalent of only looking for white swans. We need to also see if there are any black swans around by testing to see if the function returns False for numbers that are not prime.

Another way to look at this is that we are implicitly testing the wrong hypothesis. The hypothesis that this test checks is that `isprime()` returns True for prime numbers, but this is not the correct hypothesis; the actual specification is that it returns True only for prime numbers. Although it's not literally the case, we have essentially formed a non-falsifiable hypothesis without noticing and are cheerfully testing it.

It's my gut feeling that this is a relatively easy testing mistake to fall into. It's human nature (or at least our cognitive biases) to look for confirmation of what we think is the case, so we verify that `isprime()` returns True for primes and forget the other half of the specification.

There's a variant of this hypothesis falsification approach for test planning. One way to form tests is to imagine a whole series of hypotheses about how the function might work incorrectly and then attempt to falsify each one of them with a test. For example, I have two such falsification checks in the list of test primes (`2` and `883`), and a test series for `mustBeFalse(isprime(n))` would likely throw in testing odd numbers as well as even ones.

(Checking the proper handling of corner cases is one common instance of this.)

This is of course closely related to testing your error paths, and I've probably written about bits of it in passing in other entries that I can't find right now.