Why I'm usually unnerved when modern SSDs die on us
Tonight, one of the SSDs on our new Linux fileservers died. It's not the first SSD death we've seen and probably not the last one, but as almost always, I found it an unnerving experience because of a combination of how our SSDs tend to die, how much of a black box they are, and how they're solid-state devices.
Like most of the SSDs deaths that we've had, this one was very abrupt; the drive went from perfectly fine to completely unresponsive in at most 50 seconds or so, with no advance warning in SMART or anything else. One moment it was serving read and write IO perfectly happily (from all external evidence, and ZFS wasn't complaining about read checksums) and the next moment there was no Crucial MX300 at that SAS port any more. Or at least at very close to the next moment.
(The first Linux kernel message about failed IO operations came at 20:31:34 and the kernel seems to have declared the drive officially vanished at 20:32:15. But the actual drive may have been unresponsive from the start; the driver messages aren't clear to me.)
What unnerves me about these sorts of abrupt SSD failures is how inscrutable they are and how I can't construct a story in my head of what went wrong. With spinning HDs, drives might die abruptly but you could at least construct narratives about what could have happened to do that; perhaps the spindle motor drive seized or the drive had some other gross mechanical failure that brought everything to a crashing halt (perhaps literally). SSDs are both solid state and opaque, so I'm left with no story for what went wrong, especially when a drive is young and isn't supposed to have come anywhere near wearing out its flash cells (as this SSD was).
(When a HD died early, you could also imagine undetected manufacturing flaws that finally gave way. With SSDs, at least in theory that shouldn't happen, so early death feels especially alarming. Probably there are potential undetected manufacturing flaws in the flash cells and so on, though.)
When I have no story, my thoughts turn to unnerving possibilities, like that the drive was lying to us about how healthy it was in SMART data and that it was actually running through spare flash capacity and then just ran out, or that it had a firmware flaw that we triggered that bricked it in some way.
(We had one SSD fail in this way and then come back when it was pulled out and reinserted, apparently perfectly healthy, which doesn't inspire confidence. But that was a different type of SSD. And of course we've had flaky SMART errors with Crucial MX500s.)
Further, when I have no narrative for what causes SSD failures, it feels like every SSD is an unpredictable time bomb. Are they healthy or are they going to die tomorrow? It feels like I really have to hope in statistics, namely that not too many will fail not too fast before they can be replaced. And even that hope relies on an assumption that failures are uncorrelated, that what happened to this SSD isn't likely to happen to the ones on either side of it.
(This isn't just an issue in our fileservers; it's also something I worry about for the SSDs in my home machine. All my data is mirrored, but what are the chances of a dual SSD failure?)
In theory I know that SSDs are supposed to be much more reliable than spinning rust (and we have lots of SSDs that have been ticking along quietly for years). But after mysterious abrupt death failures like this, it doesn't feel like it. I really wish we generally got some degree of advance warning about SSDs failing, the way we not infrequently did with HDs (for instance, with one HD in my office machine, even though I ignored its warnings).
A spate of somewhat alarming flaky SMART errors on Crucial MX500 SSDs
We've been running Linux's smartd on all of our Linux machines for a long time now, and over that time it's been solidly reliable (with a few issues here and there, like not always handling disk removals and (re)insertions properly). SMART attributes themselves may or may not be indicative of anything much, but smartd does reliably alert on the ones that it monitors.
Except on our new Linux fileservers. For a significant amount
of time now,
smartd has periodically been sending us email about
various drives now having one 'currently unreadable (pending)
sectors' (which is SMART attribute 197). When we go look at the
affected drive with
smartctl, even within 60 seconds of the event
being reported, the drive has always reported that it now has no
unreadable pending sectors; the attribute is once again 0.
These fileservers use both SATA and SAS for connecting the drives, and we have an assorted mixture of 2TB SSDs; some Crucial MX300s, some Crucial MX500s, and some Micron 1100s. The errors happen for drives connected through both SATA and SAS, but what we hadn't noticed until now is that all of the errors are from Crucial MX500s. All of these have the same firmware version, M3CR010, which appears to be the only currently available one (although at one point Crucial apparently released version M3CR022, cf, but Crucial appears to have quietly pulled it since then).
These reported errors are genuine in one sense, in that it's not
smartd being flaky. We also track all the SMART attributes
through our new Prometheus system, and
it also periodically reports a temporary '1' value for various
MX500s. However, as far as I can see the Prometheus-noted errors
always go away right afterward, just as the
smartd ones do. In
addition, no other SMART attributes on an affected drive show any
unexpected changes (we see increases in eg 'power on hours' and
other things that always count up). We've also done mass reads,
SMART self-tests, and other things on these drives, always without
problems reported, and there are no actual reported read errors at
the Linux kernel level.
(And these drives are in use in ZFS pools, and we haven't seen any ZFS checksum errors. I'm pretty confident that ZFS would catch any corrupted data the drives were returning, if they were.)
Although I haven't done extensive hand checking, the reported errors
do appear to correlate with read and write IO happening on the
drive. In spot checks using Prometheus disk metrics, none of the
drives appeared to be inactive at the times that
us, and they may all have been seeing a combination of read and
write IO at the time. Almost all of our MX500 SSDs are in the two
in-production fileservers that have been reporting errors; we have
one that's in a test machine that's now basically inactive, and
while I believe it reported errors in the past (when we were testing
things with it), it hasn't for a while.
(Update: It turns out that I was wrong; we've never had errors reported on the MX500 in our test machine.)
I see at least two overall possibilities and neither of them are entirely reassuring. One possibility is that the MX500s have a small firmware bug where occasionally, under the right circumstances, they report an incorrect 'currently unreadable (pending) sectors' value for some internal reason (I can imagine various theories). The second is that our MX500s are detecting genuine unreadable sectors, but then quietly curing them somehow. This is worrisome because it suggests that the drives are actually suffering real errors or already starting to wear out, despite a quite light IO load and being in operation for less than a year.
We don't have any solutions or answers, so we're just going to have to keep an eye on the situation. All in all it's a useful reminder that modern SSDs are extremely complicated things that are quite literally small computers (multi-core ones at that, these days), running complex software that's entirely opaque to us. All we can do is hope that they don't have too many bugs (either software or hardware or both).
(I have lots of respect for the people who write drive firmware. It's a high-stakes environment and one that's probably not widely appreciated. If it all works people are all 'well, of course', and if any significant part of it doesn't work, there will be hell to pay.)
(Open)SSH quiet connection disconnects in theory and in practice
Suppose, not entirely hypothetically, that you are making frequent health checks on your machines by connecting to their SSH ports to see if they respond. You could just connect, read the SSH banner, and then drop the connection, but that's abrupt and also likely to be considered a log-worthy violation of the SSH protocol (in fact it is considered such by OpenSSH; you get a log message about 'did not receive identification string'). You would like to do better, in the hopes of reducing your log volume. It turns out that the SSH protocol holds out the tantalizing prospect of doing this in a protocol-proper way, but it doesn't help in practice.
The first thing we need to do as a nominally proper SSH client is to send a protocol identification string; in the SSH transport layer protocol this is covered in 4.2 Protocol Version Exchange. This is a simple CR LF delimited string that must start with 'SSH-2.0-'. The simple version of this is, say:
SSH-2.0-Prometheus-Checks CR LF
After the client sends its identification string, the server will
begin the key exchange protocol by sending a SSH_MSG_KEXINIT
packet (section 7.1).
If you use
nc or the like (I have my own preferred tool) to feed a suitable client
version string to a SSH server, you can get this packet dumped back
at you; conveniently, almost all of it is in text.
(In theory your client should read this packet so that the TCP connection doesn't wind up getting closed with unread data.)
At this point, according to the protocol the client can immediately send a disconnect message, as the specification says it is one of the messages that may be sent at any time. A disconnect message is:
byte SSH_MSG_DISCONNECT uint32 reason code string description in ISO-10646 UTF-8 encoding [RFC3629] string language tag
How these field types are encoded is covered in RFC 4251 section 5, and also the whole disconnect packet then has to be wrapped up in the SSH transport protocol's Binary Packet Protocol. Since we're early in the SSH connection and have not negotiated message authentication (or encryption), we don't have to compute a MAC for our binary packet. If we're willing to not actually use random bytes in our 'random' padding, this entire message can be a pre-built blob of bytes that our checking tool just fires blindly at the SSH server.
In practice, this doesn't work because OpenSSH logs disconnect messages; in fact, it makes things worse because OpenSSH logs both that it received a disconnect message and a 'Disconnect from <IP>' additional message. We can reduce the logging level of the 'received disconnect' message by providing a reason code of SSH_DISCONNECT_BY_APPLICATION instead of something else, but that just turns it down to syslog's 'info' level from a warning. Interesting, OpenSSH is willing to log our 'description' message, so we can at least send a reason of 'Health check' or something. I'm a little bit surprised that OpenSSH is willing to do this, given that it provides a way for Internet strangers to cause text of their choice to appear in your logs. Probably not very many people send SSH_MSG_DISCONNECT SSH messages as part of their probing.
On the one hand, this is perfectly reasonable on OpenSSH's part. On the other hand, I think it's probably not useful any more to log this sort of thing by default, especially for services that are not infrequently exposed to the Internet.
(I was going to confidently assert that there are a lot of SSH scanners out there, but then I started looking at our logs. There certainly used to be a lot, but our logs are now oddly silent, at least on a first pass.)
Sidebar: Constructing an actual disconnect message
I was going to write out a by-hand construction of an actual sample message, but in the end I had so much trouble getting things encoded that I wrote a Python program to do it for me (through the struct module). Generating and saving such messages is pointless anyway, since they don't reduce the log spam.
Still, building an actual valid SSH protocol message more or less by hand was an interesting exercise, even if having no MAC, no encryption, and no compression makes it the easiest case possible.
(I also left out the 'language tag' field by setting its string length to zero. OpenSSH didn't care, although other SSH servers might.)
The needs of Version Control Systems conflict with capturing all metadata
In a comment on my entry Metadata that you can't commit into a VCS is a mistake (for file based websites), Andrew Reilly put forward a position that I find myself in some sympathy with:
Doesn't it strike you that if your VCS isn't faithfully recording and tracking the metadata associated with the contents of your files, then it's broken?
Certainly I've wished for VCSes to capture more metadata than they do. But, unfortunately, I've come to believe that there are practical issues for VCS usage that conflict with capturing and restoring metadata, especially once you get into advanced cases such as file attributes. In short, what most users of a VCS want are actively in conflict with the VCS being a complete and faithful backup and restore system, especially in practice (ie, with limited programming resources to build and maintain the VCS).
The obvious issue is file modification times. Restoring file
modification time on checkout can cause many build systems (starting
make) to not rebuild things if you check out an old version
after working on a recent version. More advanced build systems that
don't trust file modification timestamps won't be misled by this,
but not everything uses them (and not everything should have to).
More generally, metadata has the problem that much of it isn't portable. Non-portable metadata raises multiple issues. First, you need system-specific code to capture and restore it. Then you need to decide how to represent it in your VCS (for instance, do you represent it as essentially opaque blobs, or do you try to translate it to some common format for its type of metadata). Finally, you have to decide what to do if you can't restore a particular piece of metadata on checkout (either because it's not supported on this system or because of various potential errors).
(Capturing certain sorts of metadata can also be surprisingly expensive and strongly influence certain sorts of things about your storage format. Consider the challenges of dealing with Unix hardlinks, for example.)
You can come up with answers for all of these, but the fundamental problem is that the answers are not universal; different use cases will have different answers (and some of these answers may actually conflict with each other; for instance, whether on Unix systems you should store UIDs and GIDs as numbers or as names). VCSes are not designed or built to be comprehensive backup systems, partly because that's a very hard job (especially if you demand cross system portability of the result, which people do very much want for VCSes). Instead they're designed to capture what's important for version controlling things and as such they deliberately exclude things that they think aren't necessary, aren't important, or are problematic. This is a perfectly sensible decision for what they're aimed at, in line with how current VCSes don't do well at handling various sorts of encoded data (starting with JSON blobs and moving up to, say, word processor documents).
Would it be nice to have a perfect VCS, one that captured everything, could restore everything if you asked for it, and knew how to give you useful differences even between things like word processor documents? Sure. But I can't claim with a straight face that not being perfect makes a VCS broken. Current VCSes explicitly make the tradeoff that they are focused on plain text files in situations where only some sorts of metadata are important. If you need to go outside their bounds, you'll need additional tooling on top of them (or instead of them).
(Or, the short version, VCSes are not backup systems and have never claimed to be ones. If you need to capture everything about your filesystem hierarchy, you need a carefully selected, system specific backup program. Pragmatically, you'd better test it to make sure it really does back up and restore unusual metadata, such as file attributes.)
OpenSSH 7.9's new key revocation support is welcome but can't be a full fix
I was reading the OpenSSH 7.9 release notes, as one does, when I ran across a very interesting little new feature (or combination of features):
- sshd(8), ssh-keygen(1): allow key revocation lists (KRLs) to revoke keys specified by SHA256 hash.
- ssh-keygen(1): allow creation of key revocation lists directly from base64-encoded SHA256 fingerprints. This supports revoking keys using only the information contained in sshd(8) authentication log messages.
Any decent security system designed around Certificate Authorities needs a way of revoking CA-signed keys to make them no longer valid. In a disturbingly large number of these systems as people actually design and implement them, you need a fairly decent amount of information about a signed key in order to revoke it (for instance, its full public key). In theory, of course you'll have this information in your CA system's audit records because you'll capture all of it in your audit system when you sign a key. In practice there are many things that can go wrong even if you haven't been compromised.
Fortunately, OpenSSH was never one of these systems; as covered in ssh-keygen(1)'s 'Key Revocation Lists', you could specify keys in a variety of ways that didn't require a full copy of the key's certificate (by serial number or serial number range, by 'key id', or by its SHA1 hash). What's new in OpenSSH 7.9 is that they've reduced the amount of things you need to know in practice, as now you can revoke a key given only the information in your ordinary log messages. This includes but isn't limited to CA-signed SSH keys (as I noticed recently).
(This took both the OpenSSH 7.9 change and an earlier change to log the SHA256 of keys, which happened in OpenSSH 6.8.)
This OpenSSH 7.9 new feature is a very welcome change; it's now
much easier to go from a log message about a bad login to blocking
all future use of that key, including and especially if that key
is a CA-signed key and so you don't (possibly) have a handy copy
of the full public key in someone's
However, this isn't and can't be a full fix for the tradeoff of
having a local CA. The tradeoff is still there,
it's just somewhat easier to deal with either a compromised signed
key or the disaster scenario of a compromised CA (or a potentially
With a compromised key, you can immediately push it into your system for distributing revocation lists (and you should definitely build such a system if you're going to use a local CA); you don't have to go to your CA audit records first to fish out the full key and other information. With a potentially compromised CA, it buys you some time to roll over your CA certificate, distribute the new one, re-issue keys, and so on, without being in a panic situations where you can't do anything but revoke the CA certificate immediately and invalidate everyone's keys. Of course, you may want to do that anyway and deal with the fallout, but at least now you have more options.
(If you believe that your attacker was courteous enough to use unique serial numbers, you can also do the brute force approach of revoking every serial number range except the ones that you're using for known, currently valid keys. Whether or not you want to use consecutive serial numbers or random ones is a good question, though, and if you use random ones, this probably isn't too feasible.)
PS: I continue to believe that if you use a local CA, you should be doing some sort of (offline) auditing to look for use of signed keys or certificates that are not in your CA audit log. You don't even have to be worried that your CA has been compromised, because CA software (and hardware) can have bugs, and you want to detect them. Auditing used keys against issued keys is a useful precaution, and it shouldn't need to be expensive at most people's scale.
Some tradeoffs of having a Certificate Authority in your model
There are two leading models for checking identity via public key cryptography in a self-contained environment; you can simply maintain and distribute a list of valid keys, or you can set up a little local CA, have it sign the keys, and then verify keys against the CA's public key. One prominent system that offers you the choice of either option is OpenSSH, which famously supports both models and lets you chose between them for both server public keys and user authentication keys. Despite writing a certain amount about what I see as the weaknesses of the CA model, I accept that the CA model has advantages and makes tradeoffs (like many things in security).
The obvious advantage of the CA model is that using a CA means that you don't have to distribute your keylist around. In the non-CA model, everyone needs to have a copy of the entire list of valid keys they may need to deal with; in the CA model, this is replaced by the much smaller job of distributing the CA keys. Given this, the CA model is likely to be especially popular with big places where distributing the keylist is a hard problem; you have lots of updates, a big keylist, many places to send it, or all of the above. Conversely, if your keylist is essentially constant and you have only a few places where it needs to be distributed to, the CA model is not necessarily much of a win.
(The CA model doesn't entirely eliminate moving keys around, because you still need to get keys signed and the signatures transported to the things that need to use the keys. Nor does the CA model prevent the compromise of individual keys; they can still be copied or stolen by attackers.)
By removing the keylist distribution problem, the CA model enables the use of more keys and more key changes than might be feasible otherwise. One potentially important consequence of removing the distribution problem is that new CA-signed keys are instantly valid everywhere. When you get a new key, you can use it immediately; you don't have to wait for systems to synchronize and update.
(Frequent key changes and limited key lifetimes have the traditional security advantages of limiting the impact of key theft and perhaps making it significantly harder in the first place.)
A more subtle advantage of the CA model is that using CAs enables mass invalidation of keys, because the validity of a whole bunch of keys is centrally controlled through the validity of their CA. If you remove or invalidate a CA, all keys signed (only) by it immediately stop working (assuming that your software gets things right by, eg, checking CA validity dates, not just key validity dates).
The drawback of the CA model is the same as it ever was, which is that a local CA is a single point of compromise for your entire authentication system, and having a CA means you can no longer know for sure what keys have access to your systems. If your systems are working properly you haven't signed any improper or unauthorized keys and you have a complete list of what keys you have signed, but you ultimately have to take this on trust (and audit key use to make sure that all keys you see are ones you know about). The story of the modern public CA system over the past few years is a cautionary story about how well that's worked out in the large, which is so well that people are now creating what is in effect a set of giant key distribution systems for TLS.
(That is ultimately what Certificate Transparency is; it's a sophisticated technique to verify that all keys are in a list.)
Using a local CA thus is a tradeoff. You're getting out of key distribution and giving yourself some multi-level control over key validity in exchange for creating a single point of security failure. It remains my view that in most environments, key distribution is not a big problem and properly operating a genuinely secure CA is. However, setting up a sort of secure CA is certainly much easier than readily solving key distribution (especially key distribution to end devices), so I expect using local CAs to remain eternally attractive.
(Or perhaps these days there's easily installed and operated software for local CAs that relies on the security of, say, a Yubikey for actually signing keys. Of course if the CA operator has to touch their Yubikey every time to sign a new key, you're not going to be doing that a lot.)
My non-approach to password management tools
In response to my entry on why I don't set master passwords in programs, Bill asked a good question:
Does your skepticism extend to password-management tools in general? If so, then what do you store passwords in? [...]
There are two answers to this. The first one is that I simply assume that if an attacker compromises my machine, they get essentially everything no matter what I do, so I either record a password in an unencrypted form on the machine or I don't have it on my machine at all. Access to my machine more or less gives you access to my email, and with access to my email you can probably reset all of the passwords that I keep on the machine anyway. In general and in theory, none of these are important passwords.
(In practice, these days I would care a fair bit if I lost control of some of the accounts they're for. But I started doing this back in the days when the only web accounts I had were on places like Slashdot and on vendor sites that insisted that you register for things.)
But that's the anodyne, potentially defensible answer. It's true, as far as it goes, in that I make sure that important, dangerous passwords are never recorded on my machine. But it is not really why I don't have a password manager. The deeper truth is that I've never cared enough to go through the effort of investigating the various alternatives, figuring out which one is trustworthy, competent, has good cryptography, and will be there in ten years, and then putting all of my theoretically unimportant passwords into it. This is the same lack of caring and laziness that had me use unencrypted SSH keypairs for many years until I finally motivated myself to switch over.
(Probably I should motivate myself to start using some encrypted password storage scheme, but my current storage scheme for such nominally unimportant passwords has more than just the password; I also note down all sorts of additional details about the website or registration or whatever, including things like login name, the tagged email address I used for it, and so on. Really I'd want to find a decent app that did a great job of handling encrypted notes.)
I have a long history of such laziness until I'm prodded into finding better solutions, sometimes by writing about my current approaches here and facing up to their flaws. I'll have to see if that happens for this case.
PS: The reason to encrypt passwords at rest even on my machine is the same reason to encrypt my SSH keypairs at rest; it's often a lot easier in practice to read files you're not supposed to have access to than to fully compromise the machine. On the other hand, SSH keypairs are usually in a known or directly findable location, and my collection of password information is not; an attacker would need the ability to hunt around my filesystem.
Why I don't set master passwords in programs
There are any number of programs and systems that store passwords for you, most prominently browsers with their remembered website passwords. It's very common for these programs to ask you to set a master password that will secure the passwords they store and be necessary to unlock those passwords. One of my peculiarities is that I refuse to set up such master passwords; this shows up most often in browsers, but I stick to it elsewhere as well. The fundamental reason why I don't do this because I don't trust programs to securely handle any such master password.
You might think that everyone manages this, but in practice securely handling a master password requires a lot more than obvious things like not leaking it or leaving it sitting around in memory or the like. It also includes things like not making it easy to recover the master password through brute force, which is a problem that Firefox has (and Thunderbird too); see Wladimir Palant's writeup (via). It seems likely that other master password systems have similar issues, and at the least it's hard to trust them. Cryptography is a hard and famously tricky field, where small mistakes can turn into big problems and there are few genuine experts.
I have a few core passwords that I use routinely and have memorized; these are things like Unix login passwords and the like. But if I can't trust a program to securely handle its master password, it's not safe to use one of those high value memorized passwords of mine as its master password; I'm not willing to risk the leak of, say, my Unix login password. That means that I need to create a new password to be the program's master password, and additional passwords are all sorts of hassle, especially if I don't use them frequently enough to memorize them. Even having a single password that I used for everything that wanted a master password would be an annoyance, and of course it would be somewhat insecure.
So the upshot of all of this is that I just don't use master passwords. Since all of the passwords that I do allow things to store are not strongly protected, I make sure to never allow my browsers, my IMAP clients, and so on to store the password for anything I consider really important. Sometimes this makes life a bit more inconvenient, but I'm willing to live with that.
(The exception that proves the rule is that I do have a fair bit of trust in my iPhone's security, so I'm willing to have it hold passwords that I don't allow other things to get near. But even on the iPhone, I haven't tried to use one of the password store apps like 1Password, partly because I'm not sure if they'd get me anything over Apple's native features for this.)
I don't have any clever solutions to this in general. The proliferation of programs with separate password management and separate master passwords strikes me as a system design problem, but it's one that's very hard to fix in today's cross-platform world (and it's impossible to fix on platforms without a strong force in control). Firefox, Chrome, and all of those other systems have rational reasons to have their own password stores, and once you have separate password stores you have at least some degree of user annoyance.
PS: One obvious solution to my specific issue is to find some highly trustworthy password store system and have it hold the master passwords and so on. I'm willing to believe that this can be done well on a deeply integrated system, but I primarily use Linux and so I doubt there's any way to have a setup that doesn't require various amounts of cutting and pasting. So far the whole area is too much of a hassle and involves too much uncertainty for me to dig into it.
(This is another personal limit on how much I care about security, although in a different form than the first one.)
The importance of explicitly and clearly specifying things
I was going to write this entry in an abstract way, but it is easier and more honest to start with the concrete specifics and move from there to the general conclusions I draw and my points.
We recently encountered an unusual Linux NFS client behavior, which at the time I called a bug. I have since been informed that this is not actually a bug but is Linux's implementation of what Linux people call "close to open cache consistency", which is written up in the Linux NFS FAQ, section A8. I'm not sure what to call the FAQ's answer; it is partly a description of concepts and partly a description of the nominal kernel implementation. However, this kernel implementation has changed over time, as we found out, with changes in user visible behavior. In addition, the FAQ doesn't make any attempt to describe how this interacts with NFS locking or if indeed NFS locking has any effect on it.
As someone who has to deal with this from the perspective of programs that are running on Linux NFS clients today and will likely run on Linux NFS clients for many years to come, what I need is a description of the official requirements for client programs. This is not a description of what works today or what the kernel does today, because as we've seen that can change; instead, it would be a description of what the NFS developers promise will work now and in the future. As with Unix's file durability problem, this would give me something to write client programs to and mean that if I found that the kernel deviated from this behavior I could report it as a bug.
(It would also give the NFS maintainers something clear to point people to if what they report is not in fact a bug but them not understanding what the kernel requires.)
On the Linux NFS mailing list, I attempted to write a specific
description of this from the FAQ's wording (you can see my attempt
and then asked some questions about what effect using
on this (since the FAQ is entirely silent on this). This uncovered
another Linux NFS developer who apparently has a different (and
less strict) view of what the kernel should require from programs
here. It has not yet yielded any clarity on what's guaranteed about
flock()s interaction with Linux CTO cache consistency.
The importance of explicitly and clearly specifying things is that
it deals with all four issues that have been uncovered here. With
a clear and explicit specification (which doesn't have to be a
formal, legalistic thing), it would be obvious what writers of
programs must do to guarantee things working (not just now but also
into the future), all of the developers could be sure that they
were in agreement about how the code should work (and if there's
disagreement, it would be immediately uncovered), any unclear or
unspecified areas would at least become obvious (you could notice
that the specification says nothing about what
flock() does), and
it would be much clearer if kernel behavior was a bug or if a kernel
change introduced a deviation from the agreed specification.
This is a general thing, not something specific to the Linux kernel or kernels in general. For 'kernel' you can substitute 'any system that other people base things on', like compilers, languages, web servers, etc etc. In a sense this applies to anything that you can describe as an API. If you have an API, you want to know how you use the API correctly, what the API actually is (not just the current implementation), if the API is ambiguous or incomplete, and if something is a bug (it violates the API) or just a surprise. All of this is very much helped by having a clear and explicit description of the API (and, I suppose I should add, a complete one).
Explicit manipulation versus indirect manipulation UIs
One of the broad splits in user interfaces in general is the spectrum between what I'll call explicit manipulation and indirect manipulation. Put simply, in a explicit manipulation interface you see what you're working on and you specify it directly, and in an indirect manipulation interface you don't; you specify it indirectly. The archetypal explicit manipulation interface is the classical GUI mouse-based text selection and operations on it; you directly select the text with the mouse cursor and you can directly see your selection.
(This directness starts to slip away once your selection is large enough that you can no longer see it all on the screen at once.)
An example of an indirect manipulation interface is the common
interactive Unix shell feature of
!-n, for repeating (or getting
access to) the Nth previous command line. You aren't directly
pointing to the command line and you may not even still have it
visible on the screen; instead you're using it indirectly, through
knowledge of what relative command number it is.
A common advantage of indirect manipulation is that indirect
manipulation is compact and powerful, and often fast; you can do
a lot very concisely with indirect manipulation. Typing '
is unquestionably a lot faster than scrolling back up through a
bunch of output to select and then copy/paste a command line. Even
the intermediate version of hitting cursor up a few times until the
desired command appears and then CR is faster than the full scale
GUI text selection.
(Unix shell command line editing features span the spectrum of
strong indirect manipulation through strong explicit manipulation;
!-n notation, cursor up/down, interactive search, and
once you have a command line you can edit it in basically an explicit
manipulation interface where you move the cursor around in the line
to delete or retype or alter various bits.)
Indirect manipulation also scales and automates well; it's generally clear how to logically extend it to some sort of bulk operation that doesn't require any particular interaction. You specify what you want to operate on and what you want to do, and there you go. Abstraction more or less requires the use of indirect manipulation at some level.
The downside of indirect manipulation is that it requires you to
maintain context in order to use it, in contrast to explicit
manipulation where it's visible right in front of you. You can't
!-7' without the context that the command you want is that
one, not -6 or -8 or some other number. You need to construct and
maintain this context in order to really use indirect manipulation
effectively, and if you get the context wrong, bad things happen.
I have accidentally shut down a system by being confidently wrong
about what shell command line a cursor-up would retrieve, for
example, and mistakes about context are a frequent source of
production accidents like 'oops we just mangled the live database,
not the test one' (or 'oops we modified much more of the database
than we thought this operation would apply to').
My guess is that in much the same way that custom interfaces can be a benefit for people who use them a lot, indirect manipulation interfaces work best for frequent and ongoing users, because these are the people who will have the most experience at maintaining the necessary context in their head. Conveniently, these are the people who can often gain the most from using the compact, rapid power of indirect manipulation, simply because they spend so much time doing things with the system. By corollary, people who only infrequently use a thing are not necessarily going to remember context or be good at constructing it in their head and keeping track of it as they work (see also).
(The really great trick is to figure out some way to provide the power and compactness of indirect manipulation along with the low need for context of explicit manipulation. This is generally not easy to pull off, but in my view incremental search shows one path toward it.)
PS: I'm using 'user interface' very broadly here, in a sense that
goes well beyond graphical UIs. Unix shells have a UI, programs
have a UI in their command line arguments,
awk have a
UI in the form of their little languages, programming languages and
APIs have and are UIs, and so on. If people use it, it's in some
sense a user interface.
(I'd like to use the term 'direct manipulation' for what I'm calling 'explicit manipulation' here, but the term has an established, narrower definition. GUI direct manipulation interfaces are a subset of what I'm calling explicit manipulation interfaces.)