## Unit testing by analogy to scientific hypotheses

September 30, 2011

In the popular and currently dominant view of how to consider whether something is a proper scientific hypothesis, an important criteria is falsifiability. To simplify a great deal, you test a scientific hypothesis not just by looking for what it says should be there but also by looking for what it says should not be there. If the hypothesis is 'all swans are white' you don't just look for white swans, you also look for ones that are not white.

Let us consider a theoretical function that returns True if a number is a prime (and False if it is not). We need to write a test for this function, so we fire up an editor:

```def testPrimeness():
for i in 2, 3, 5, 7, 883:
mustBeTrue(isprime(i))
```

We're done, right? (Ignoring that this is only a short list of primes.)

No, not at all. What we've done is the testing equivalent of only looking for white swans. We need to also see if there are any black swans around by testing to see if the function returns False for numbers that are not prime.

Another way to look at this is that we are implicitly testing the wrong hypothesis. The hypothesis that this test checks is that `isprime()` returns True for prime numbers, but this is not the correct hypothesis; the actual specification is that it returns True only for prime numbers. Although it's not literally the case, we have essentially formed a non-falsifiable hypothesis without noticing and are cheerfully testing it.

It's my gut feeling that this is a relatively easy testing mistake to fall into. It's human nature (or at least our cognitive biases) to look for confirmation of what we think is the case, so we verify that `isprime()` returns True for primes and forget the other half of the specification.

There's a variant of this hypothesis falsification approach for test planning. One way to form tests is to imagine a whole series of hypotheses about how the function might work incorrectly and then attempt to falsify each one of them with a test. For example, I have two such falsification checks in the list of test primes (`2` and `883`), and a test series for `mustBeFalse(isprime(n))` would likely throw in testing odd numbers as well as even ones.

(Checking the proper handling of corner cases is one common instance of this.)

This is of course closely related to testing your error paths, and I've probably written about bits of it in passing in other entries that I can't find right now.

From 82.132.210.165 at 2011-09-30 04:03:46:

That test should fail once... 9 isn't prime.

Probably sats something about the error rate in humans.

From 87.194.122.82 at 2011-09-30 04:45:08:

Draw what conclusions you will from my typo. :-P

By cks at 2011-09-30 08:32:26:

Oops! What an embarrassing error. Now corrected, thanks.

(And here I carefully looked up a higher prime and even at one point looked for a low odd non-prime, yet completely failed to notice that 9 was not prime.)

From 78.35.25.18 at 2011-10-01 05:55:16:

Although it’s not literally the case, we have essentially formed a non-falsifiable hypothesis without noticing and are cheerfully testing it.

But you didn’t. The hypothesis that `isprime()` returns `True` for prime numbers and is undefined for others is equally as falsifiable as the hypothesis that it returns `True` for prime numbers and `False` for others. All you need to falsify it is to find one prime number for which `isprime()` returns `False`.

In fact I can’t think of any properly working unit test that tests a non-falsifiable hypothesis, since unit tests are all about falsifying some sort of statement. For that you would have to have tests that can fail randomly (for some definition of random) without the failure revealing a bug in the code. That’s certainly a popular kind of problematic test, but it’s not what’s going on in your case.

Written on 30 September 2011.