Wandering Thoughts archives

2015-11-14

ZFS pool import needs much better error messages

One of the frustrating things about dealing with sufficiently damaged ZFS pools is that 'zpool import' and friends do not generate very detailed error messages. There are a lot of things that can go wrong with a ZFS pool that will make it not importable, but 'zpool import' has clear explanations for only some of them. For many others all you get is a generic error in 'zpool import' status reporting of, say:

The pool cannot be imported due to damaged devices or data.

(Here I'm talking about the results of just running 'zpool import' to see available pools and their states and configuration, not trying to actually import a pool. Here zpool has lots of room to write explicit and detailed messages about what seems to be wrong with your pool's configuration.)

This isn't just an issue of annoying and frustrating people with opaque, generic error messages. Given that the error messages are generic, it's quite easy for people to focus only on the obvious problems that zpool import reports, even if those problems may not be the reason the pool can't be imported. As it happens I have a great example of this in action, in this SuperUser question. When you read this question, can you figure out what's wrong? Both the SuperUser ZFS community and the ZFS on Linux mailing list couldn't.

(I believe that everything you need to figure out what's going on is actually in the information in the question and the code behind 'zpool import' actually knows what the problem is. This assumes that my diagnosis is correct, of course.)

Perhaps zpool import should not be fully verbose by default, as there's a certain amount of information that may only make sense to people who know a fair bit about how ZFS works. But it certainly should be possible to get this information with, eg, a verbose switch instead of having to reverse engineer it from zdb output. If nothing else, this means that you can get a verbose report and show it to ZFS exports in the hope that they can tell you what's wrong.

On a purely pragmatic level I think that zpool import should be really verbose and detailed when a pool can't be imported. 'My pool won't import' is one of the most stressful experiences you can have with ZFS; to get unclear, generic errors at this point is extremely frustrating and does not help one's mood in the least. This is exactly the time when large amounts of detail are really, really appreciated, even if they're telling you exactly how far up the creek you are.

(This means that I would very much like a 'zpool import -v <pool>' option that describes exactly what the import is doing or trying to do and then covers all of the problems that it detected with the pool configuration, all the things the kernel said to it, and so on. A report of 'I am asking the kernel to import a pool made up of the following devices in the following vdev structure' is not too verbose.)

PS: while this example is from ZFS on Linux and FreeBSD, I've looked at the current Illumos code for zpool and libzfs, and as far as I can see it would have exactly the same problem here.

(Part of the issue is that zpool import and libzfs have what you could call less than ideal reporting if a pool is marked as active on some other system and also has configuration problems. But even if it reported multiple errors I think that the real problem here would remain obscure; the current 'zpool import' code appears to deliberately suppress printing out parts of the information necessary.)

solaris/ZFSImportBetterErrors written at 00:35:51; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.