Wandering Thoughts archives

2017-10-29

There are several ways to misread specifications

In an ideal world, everyone reads specifications extremely carefully and arrives at exactly the same results. In the real world, generally specifications are read casually, with people skimming things, not putting together disparate bits, and so on. If the specification is not clear (and perhaps even if it is), one of the results of this is misinterpretations. However, it's my view that not all misinterpretations are the same, and these differences affect both how fast the misreading propagates and how likely it is to be noticed (both of which matter since ultimately specifications are defined by their implementations).

I see at least three ways for misreadings to happen. First, you can accept more than the specification wants; if a field is specified as ASCII, you can also accept UTF-8 in the field. Second, you can accept less than the specification wants; if a field is specified as UTF-8, you can only accept ASCII. Finally, you can have a different interpretation of some aspect; the specification might have a 'hostname' field where it's intended to accept ASCII or UTF-8, but you accept ASCII or IDNA ASCII and decode the IDNA ASCII to UTF-8.

In a sense, the important thing about all of these misreadings is that they're only partially incompatible with the real specification. In all of my examples, a pure-ASCII thing can be successfully interchanged between your misreading and a fully correct implementation. It's only when one side or the other goes outside of the shared base that problems ensue, and that's obviously part of how far a misreading can propagate before it's noticed.

There are also misreadings where you are outright wrong, for example the specification says 'text' and you assume this means UTF-16 instead of ASCII. But these sort of misreadings are going to be detected almost immediately unless the field (or option or whatever) is almost never used, because you'll be completely incompatible with everyone else instead of only partially incompatible.

My belief is that the misreading that will propagate the farthest and be noticed the slowest is a more expansive reading, because there's no way to notice this until someone eventually generates an improper thing and it's accepted by some implementations but not all. If those accepting implementations are popular enough, the specification will probably expand through the weight of enough people accepting this new usage.

(The corollary to this is that if a specification has conformance tests, you should include tests for things that should not be accepted. If your specification says 'field X is ASCII only' or 'field X cannot be HTML', include test vectors where field X is UTF-8 or contains HTML or whatever.)

Misreadings that are more restrictive or different can flourish only when the things that they don't accept are missing, rare, or hard to notice the mistake in. This goes some way to explain why implementation differences are often found in the dark corners or obscure parts of a specification, in that there isn't enough usage for people to notice misreadings and other mistakes.

tech/SpecMisreadingSeveralWays written at 01:22:52; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.