I should document my test plans and their results

March 9, 2015

Every so often I rebuild one of our servers because, for instance, we're moving it from Ubuntu 10.04 to Ubuntu 14.04 (seeing as 10.04 is about to stop being supported). When I do this, of course I need to test that the new version of the server (and its services) works properly. So far this has been a basically ad-hoc process and I usually haven't actually written down very much about it. But after a recent experience or two, I've come around to the obvious realization that I should change that.

So, the simple statement: I should actively document how to test our services and the results of doing those tests. Not just as lab notes to myself that I might scribble down while I do this (and then keep), but as some degree of formal documentation. There's several things that such documentation is good for, in my view.

First, it tells us at least some tests that we want to do on the next version (because there will be a next version) and how to do them or at least how they were successfully done the last time around. These will wind up being tests both for basic functionality of the service and to cover any special problem points I found during the current build (even if in theory we've worked around them, it's good to test specifically for them for the same reason that you add tests for program bugs to your test suite).

Put simply, if I've gone to all of the work to come up with how to test the current (new) version, writing it down saves me having to redo all of that work the next time around, lets me document any surprise bear traps I found and worked around, and lowers the chances that next time around I'll miss something important. People are fallible, so the more help I give my future self the better.

(As a side benefit this will document the things that we can't test short of the machine actually being in production, and thus want to explicitly test or monitor when we do put it into production.)

Explicitly documenting the results of all of the tests (even to say 'all of these go/no go tests passed') serves both as a checklist and as a record for the future. If we run into problems later on, we can look back to see what we definitely did and did not test for. If we missed a test, well, we can add it. If we did a test and it passed at the time, we know that something changed since then (or the test was not sufficient, or the test environment was not realistic enough).

Some of our tests are performance tests, generally to see if something was good enough. Documenting the results of these tests (and their specific circumstances and methods) is especially useful for retrospective analysis and for comparing the results with other, future tests. What sort of initial IO performance did we get on our new iSCSI backends, for example, and under what specific conditions? What did we consider was 'good enough' 10G network performance, measured with what tools, and how were the machines connected to each other? And so on and so forth. It's one thing to know that we assessed the results as good enough and another thing entirely to be able to go back, see the specific results, and then compare them to what we're getting with new software or new hardware or the like.

(This is important partly because we don't routinely do performance tests; about the only time we do it is when we're building out a new system and want to make sure that it works well enough, whatever that is. Thus, testing the new system is our single best chance to capture performance information and the exact tests we did. If we actually document all of this, in detail.)

Finally and obviously, if we later run into problems with something we can go back to assess both what tests we did and what tests we could reasonably add to cover the problems. If we don't document the tests we did, we're really up in the air about whether we should have caught the problem before we went into production. This is the same theory as preserving your completed checklists so that later you can revisit them to spot what you missed and should do differently next time around.

(This elaborates a lot on a tweet of mine. And yes, part of why I'm writing this is to push myself to actually do this.)

Written on 09 March 2015.
« Why ZFS's 'directory must be empty' mount restriction is sensible
Why installing packages is almost always going to be slow (today) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Mar 9 03:02:08 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.