Why I spent a lot of time agonizing over an error message recently

November 10, 2015

I recently spent an inordinate amount of time not so much writing a local script as repeatedly writing, rewriting, and modifying its error messages (the rest of the script is mostly simple). Now, I'll admit up front that I have a general habit of obsessing over small details of program output, and maybe some of the fidgeting with the error messages was for this. But I actually maintain that I had a completely sensible reason for caring so much about the script's error messages. You see, the script isn't supposed to fail.

More exactly, it's not supposed to fail but we think that it might someday do so because every so often something weird is going on with the operation the script is doing. In fact the script exists to automate certain workarounds we were doing when we did this particular operation 'by hand' (it's actually buried inside another script). So almost all of the time the script is supposed to work, and we certainly hope it works all the time, but there's a rare possibility of failure lurking in the underbrush.

What this means for the script is that by the time we get an error, we'll probably have long since forgotten exactly what's going on. It's likely that the script will work reliably for weeks and months, during which our knowledge of the entire problem will have been displaced by other things. This means it's important for the error message we get to be clear, so we don't have to try to remember all of the surrounding context from scratch. A cryptic error message would make perfect sense for us right now, when the context is clear in our minds, but it won't in six months.

When I was revising the error message, one part of what I did was to look for things that might be mis-remembered or misinterpreted by people who'd forgotten the context. A surprisingly large amount of my initial language was at least partially ambiguous when I took a step back and tried my best to read it without context. Things that were obvious or only had one meaning inside the context suddenly took on an uncomfortable new life outside it. The resulting error messages are significantly more verbose now, but at least I can hope that they'll still make sense in six months.

(This is of course a version of the problem of context in programming.)

Written on 10 November 2015.
« What sysadmins want out of logging means that it can't be too simple
No new web templating languages; use an existing one »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Nov 10 01:33:02 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.