The importance of explicitly and clearly specifying things
I was going to write this entry in an abstract way, but it is easier and more honest to start with the concrete specifics and move from there to the general conclusions I draw and my points.
We recently encountered an unusual Linux NFS client behavior, which at the time I called a bug. I have since been informed that this is not actually a bug but is Linux's implementation of what Linux people call "close to open cache consistency", which is written up in the Linux NFS FAQ, section A8. I'm not sure what to call the FAQ's answer; it is partly a description of concepts and partly a description of the nominal kernel implementation. However, this kernel implementation has changed over time, as we found out, with changes in user visible behavior. In addition, the FAQ doesn't make any attempt to describe how this interacts with NFS locking or if indeed NFS locking has any effect on it.
As someone who has to deal with this from the perspective of programs that are running on Linux NFS clients today and will likely run on Linux NFS clients for many years to come, what I need is a description of the official requirements for client programs. This is not a description of what works today or what the kernel does today, because as we've seen that can change; instead, it would be a description of what the NFS developers promise will work now and in the future. As with Unix's file durability problem, this would give me something to write client programs to and mean that if I found that the kernel deviated from this behavior I could report it as a bug.
(It would also give the NFS maintainers something clear to point people to if what they report is not in fact a bug but them not understanding what the kernel requires.)
On the Linux NFS mailing list, I attempted to write a specific
description of this from the FAQ's wording (you can see my attempt
and then asked some questions about what effect using
on this (since the FAQ is entirely silent on this). This uncovered
another Linux NFS developer who apparently has a different (and
less strict) view of what the kernel should require from programs
here. It has not yet yielded any clarity on what's guaranteed about
flock()s interaction with Linux CTO cache consistency.
The importance of explicitly and clearly specifying things is that
it deals with all four issues that have been uncovered here. With
a clear and explicit specification (which doesn't have to be a
formal, legalistic thing), it would be obvious what writers of
programs must do to guarantee things working (not just now but also
into the future), all of the developers could be sure that they
were in agreement about how the code should work (and if there's
disagreement, it would be immediately uncovered), any unclear or
unspecified areas would at least become obvious (you could notice
that the specification says nothing about what
flock() does), and
it would be much clearer if kernel behavior was a bug or if a kernel
change introduced a deviation from the agreed specification.
This is a general thing, not something specific to the Linux kernel or kernels in general. For 'kernel' you can substitute 'any system that other people base things on', like compilers, languages, web servers, etc etc. In a sense this applies to anything that you can describe as an API. If you have an API, you want to know how you use the API correctly, what the API actually is (not just the current implementation), if the API is ambiguous or incomplete, and if something is a bug (it violates the API) or just a surprise. All of this is very much helped by having a clear and explicit description of the API (and, I suppose I should add, a complete one).