2013-03-27
What checksums in your filesystem are usually actually doing
The usual way to talk about the modern trend of filesystems with inherent checksums (such as ZFS and btrfs) is to say that the checksums exist to detect data corruption in your files (and in the filesystem as a whole). In an environment with a certain amount of random bit flips, decaying media, periodic hardware glitches, and other sources of damage, it's no longer good enough to imagine that if you wrote it to disk you're sure to read it back perfectly (or to get a disk error). Filesystems with checksums are sentinels, standing on guard for you and letting you know when this has happened to your data.
But this is not quite what they do in practice (generally). This is because they perform this sentinel duty by denying you access to your data. In doing this they implicitly prioritize integrity over availability; better to not give you data at all than to give you data that at least seems damaged. The same is true but even more so if filesystem metadata seems damaged.
(This is similar to the tradeoff disk encryption makes for you.)
You may not be exactly happy with this tradeoff. Yes, it's nice to know if you're reading corrupt data, but sometimes you really want to see that data anyways just to see if you can reconstruct something. This goes even more so for filesystem metadata, especially core metadata; it's not hard to get into a situation where almost all of your data is intact and probably recoverable but the filesystem won't give it to you.
Old filesystems went the other way, and not just by not having any sort of checksums; they often came with quite elaborate recovery tools that would do almost everything they could to get something back. The results might be scattered in little incoherent bits all over the filesystem, but if you cared enough (ie it was important enough), you had a shot at assembling what you could.
(This is still theoretically possible with modern checksumming filesystems but at least some of them are very strongly of the opinion that the answer here is 'restore from backups (of course you have backups)' and so they don't supply any real sort of tools to help you out.)
My opinion is that filesystems ought to support an interface that allows
you to get access to even data that fails checksums (perhaps through a
special 'no error on checksum error' flag for open()). This wouldn't
fix all of the problems (since it wouldn't help in the face of many
metadata issues) but it would at least be something and a gesture to
agreeing that integrity is not always the most important thing.
2013-03-17
The power of suggestion that documentation has
DWikiText (the wikitext dialect of Wandering Thoughts) has a couple of different ways of writing links, less by design and more because of evolution over time. DWikiText started out with one form, later added a second that I felt was better, and then of course could never remove the first one because there were plenty of existing entries using it. When the second form was added I updated the DWikiText help to include a mention of it. More to the point I more or less tacked it on the end of the help text in appropriate spots. I didn't revise the help text or the examples it uses to make the second form the prominent form; instead, the first form kept on being the first one mentioned and the one used for examples.
(The two forms are [[text text text|URL]], the first and more awkward one, and the simpler one, [[text text text URL]]. In addition to being simpler to write I think that the second one plain looks better, although it doesn't clearly mark out that the last word is special.)
I switched to using the second form pretty much the moment it was available, but what I've recently noticed is that a lot of other people using DWikiText (either in comments here or on their own DWiki) are using the older, more awkward first form. While I can't know exactly why they're doing that, I suspect that it's because my help text lists the old form first. I'm guessing that people scan the help text for 'how to make a link', scan that section to find the first way that works for them, and then use it; they don't bother to read further and really, why should they? They've got the answer that they came for.
What this says to me is that documentation has much stronger and more subtle powers of suggestion than I was tacitly assuming. It's not enough to document everything; you (ie, I) have to structure things so that readers are led to the best or preferred options first. This makes total sense and I'm sure I've had related thoughts before (probably about other people's documentation), but I haven't looked at my own stuff through this lens before. I think I have some revising to do on the DWiki help documentation and I'm going to have to remember this for any future documentation I write in general.
(To be clear I think this is perfectly sensible behavior on the part of the people reading the documentation. People generally do not read our documentation as if it was a fascinating book that they want to fully consume. Unless you force them or write really stunning prose, why shouldn't they spend as little time as possible by skimming and focusing on just what they want to know? And forcing people to read all your deathless prose is not going to be highly appreciated by your audience, to put it one way.)
2013-03-12
In universities, computers are not an essential service
This is going to sound very odd, but it really is true: in most universities, computers and networking are not a truly essential priority. I don't mean that computers aren't important or that losing computing would not be a very serious problem, because in a modern university neither is true; if the university as a whole or even a department were to drop off the network it would be a very big deal and a crisis.
But in the worst case, if the university's computers all went away one day the university would not shut down until it could replace them. There would be major disruption and pain, but people would keep on getting taught and a fair amount of research would still keep happening (probably more than I would expect). It definitely would not be the kind of event where you tell everyone to stay home until further notice because there is no point in them showing up to work.
Partly this is intrinsic in what the core mission of a university is. A university exists to teach undergraduates, get graduate students to produce theses, and to obtain grant funding; none of these functions universally require computers (although in some fields they do). Another part of this is due to the patterns of communication inside universities, where for many professors and graduate students it is more important to communicate with people outside the university than most people inside it.
(This leads to a situation where the disaster recovery plan for many people would be 'take my personal laptop to a coffee shop, get a webmail account, and start mailing people from it to tell them my new address'.)
The one exception to this is HR systems, and in particular payroll. If the university cannot pay people their salaries somehow, it will stop having very many people before too long; not necessarily because people want to leave, but more because there's only so long that people can go without being paid before they have to find another job to cover the bills.
(I'll admit that I'm somewhat handwaving the issue of essential data like course records. In the medium term a university without access to its computerized course records might have problems giving out undergraduate degrees, which would mean that its undergraduates would start evaporating.)