How to set up your module exceptions to be useful
Suppose that you are writing a Python module that has exceptions as part of its interface, and you want them to actually be useful to people. From the perspective of a sometimes grumpy user of module exceptions, here is my opinions on what you should do:
- all exceptions that you expect people to actually catch should descend
from a common ancestor class, so that people can just do '
except YourException, e:' and be done with it.
If you have an exception you raise for internal errors and other impossible situations, do not make it part of this hierarchy; otherwise people will catch it when you don't want them to.
- people will mostly use your error exceptions by turning them into
strings (and printing them out). Make sure that this gives useful
- you need to document what fields your exceptions have. If you do not,
people will just
str()your exceptions and ignore any information that this misses.
- if you have multiple exception classes, try to have as many common
fields in them as possible. As a pragmatic thing, expect people to
mostly ignore unique fields unless they have important information
that people need to act on.
- do not ever reuse or subclass standard exceptions, especially
IOError and OSError.
(If you are ever tempted to break this rule, make absolutely sure that your versions of these exceptions are exactly identical to the real ones.)
- if you call code from other modules, you should capture and wrap up their exceptions, turning them into exceptions of your own. This is much easier for your callers to deal with, since they don't have to know what other modules you use (or care if you change what modules you use).
In general, I would say that you should avoid the temptation to get too complex in error handling. Put yourself in the shoes of the typical person using your module; are you going to care about anything beyond the fact that some sort of an error happened? Generally not. Then you just need a single exception class with a useful string error message, and you're done.
(I admit that I don't always stick to this simple model for my own code; I sometimes have an irrepressible urge to subclass my overall module error class so that I can distinguish this sort of error from that sort of error and so on. Then I never use any of these features.)
How disk write caches can corrupt filesystem metadata
It's intuitively obvious how (volatile) disk write caches can result in you losing data in files if something goes wrong; you wrote data, the disk told your program it had been written, the disk lost power, and the data vanishes. But it may be less obvious how this can result in corrupted or destroyed filesystems and thus why you need (working) cache flush operations even just to keep your filesystems intact (never mind what user level programs may want).
Consider a filesystem where you have two pieces of metadata, A and B, where A points to B; A might be a directory and B a file inode, or A might be a file inode and B a table of block pointers. Since filesystem metadata is often some sort of tree, this sort of pointing is common (nodes higher up the tree point to nodes lower down). Now suppose that you are creating a new B (say you are adding a file to a directory). In order to keep the metadata consistent, you want to write things bottom first; you want to write the new B and then the new version of A.
(It's common to have several layers of pointing; A points to B which points to C which points to D and so on. In such cases you usually don't have to write each one by one, pausing before the next. Instead you just need everything else written, in some order, before you make the change visible by writing A.)
In theory disks with volatile write caches don't upset this; your metadata is still consistent if the disk loses power and neither A nor B get written. What breaks metadata consistency is that disks with write caches don't necessarily write things in order; it's entirely possible for a disk to cache both the B and A writes, then write A, then lose power with B unwritten. At this point you have A pointing to garbage. Boom. And disks with write caches are free to keep things unwritten for random but large amounts of time for their own inscrutable reasons (or very scrutable ones, such as 'A keeps getting written to').
(Note that copy-on-write filesystems are especially exposed to this, because they almost never update things in place and so are writing a lot of new B's and changing where the A's point. And the A is generally the very root of the filesystem, so if it points to nowhere you have a very bad problem.)
In the simple case you can get away with just a disk write barrier for metadata integrity, so that you can tell the disk that it can't write A before it's written B out. However, this isn't sufficient when you're dealing with multi-disk filesystems, where A may be on a different disk entirely than B. There you really do need to be able to issue a cache flush to B's disk and know that B has been written out before you queue up A's write on its disk. (Otherwise you could again have A written but not B, because B's disk lost power but A's did not.)
The multi-disk filesystem case is a specific example of the general case where write barriers aren't good enough: where you're interacting with the outside world, not just with things on the disk itself. Since all sorts of user level programs interact with the outside world, user programs generally need real 'it is on the disk' cache flush support.
(This is the kind of entry that I write to make sure I understand the logic so that I can explain it to other people. As usual, it feels completely obvious once I've written it out.)
Sidebar: write cache exposure versus disk redundancy
I believe that in a well implemented redundant filesystem, the filesystem's metadata consistency should survive so long as the filesystem can find a single good copy of B. For example if you have an N-way mirror, you're still okay even if N-1 disks all lose the write (such as by losing power simultaneously); you're only in trouble if all of them do. This may give you some reassurance even if you have disks that ignore or don't support cache flushes (apparently this includes many common SSDs, much to people's displeasure).
(In disk-level redundancy instead of filesystem-level redundancy you may have problems recognizing what's a good copy of B. Let's assume that you have ZFS-like checksums and so on.)
Of course, power loss events can be highly correlated across multiple disks (to put it one way). Especially if they're all in the same server.