An illustration of why it's hard to port outside code into the Linux kernel
Sometimes, people show up with useful kernel-side things that were originally written for other systems and try to put them into Linux as (GPLv2) donations. Often these have been filesystems (SGI and XFS, IBM and JFS), but there have been other attempts at code drops. Most recently, Oracle made DTrace available under GPLv2 and integrated it into their kernel. I've said before that this is not an easy thing to do and can take years to actually get the code in to the Linux kernel (eg, XFS). Part of that is that the Linux kernel developers are picky about the code that they accept and require it to follow Linux kernel conventions (because they'll be supporting it for years), but part of that is because joining up code written for another environment with the Linux kernel is a hard, long process, with lots of subtle issues that can be overlooked despite very good people with the best of intentions.
As it happens, I have an illustration of this. Before I start, I want to say explicitly that I think ZFS on Linux is a very solid and well done project. In fact, that it's so solid and well done makes this case all the more useful as an illustration, because it shows how a bunch of very good people, working very hard (and for a long time) and doing a very good job, can still have something slip by in the interface between the Linux kernel and outside code. So here is the story.
Recently, someone on the ZFS on Linux mailing list reported a
kernel panic related to snapshots
and the ZoL issue report). The first background
detail required is that under some circumstances, accessing
snapshots can get
ELOOP from the kernel's general filesystem code.
The specific crash happened because when the kernel converted a
NFS filehandle that was (apparently)
for a file in a ZFS snapshot into a kernel dentry, the dentry pointer
it got back from ZFS had the value of 0x28 (decimal 40). Although
very low faulting addresses usually have causes related to NULL
pointers, 40 happens to be the errno
ELOOP, which is suspicious.
Internally, the Linux kernel has a somewhat odd way of handling
errors. Probably in order to avoid confusing them with valid values
that might be returned normally, errors are usually returned from
functions as negative errno values; for example, if a pathname
lookup fails because there's no such name in the directory, the
relevant functions will return
-ENOENT (and there's a whole
infrastructure to smuggle these negative errno values in and out
of what are normally pointers). Much of the rest of the world has
kernel functions that return positive errnos to signal errors, and
in particular the original Solaris/Illumos ZFS code uses positive
errnos to return errors, so a ZFS function that wants to signal 'no
such name' will return
The ZFS on Linux code tries to stay as close to the upstream Illumos
ZFS code as possible to make it easier to port fixes and changes
back and forth. As part of this, it has not changed to using Linux
style negative errnos internally; instead, it converts from ZFS
positive errnos (in its own code) to Linux kernel negative errnos
when it returns results to the kernel, such as when it is converting
a NFS filehandle into a ZFS dnode and then a kernel dentry, which may
fail with errors like
ESTALE. This conversion is
done by negating the internal ZFS errno,
turning a positive ZFS errno into a negative kernel errno (which is
then smuggled inside a pointer).
All of this is fine except that there turns out to be a point in converting NFS filehandles where the ZFS on Linux code calls a kernel function to do path lookups and returns its error result unaltered. Since this is a kernel function, it returns negative errnos, which are passed up through the ZFS on Linux call stack and then carefully negated by ZoL before it returns them to the kernel. This careful negation turns what was a negative kernel errno into a positive number that the kernel thinks is not an error, but a dentry pointer. Things do not go well from here.
All of the code involved looks innocent and good on a normal inspection; you're calling functions, you're checking for errors, you're returning errors if there are any, everything looks normal and standard. You need a bunch of contextual knowledge to know that this one function call is special and returns a dangerously different result from everything else (if it encounters an error at all, which it usually doesn't), and it needs special handling. The commit that added this code is over a year old and was reviewed by one of the most experienced ZoL developers, and the code has passed tests and been used in production (where it worked because errors from this kernel function are very rare in this context).
This error is not in ZFS code and it is not in kernel code; it's at the seam between the two, where one world must be carefully converted to the other. Here, one little spot was missed and joined imperfectly, and the result was a kernel panic a year later. And that's part of why porting outside code into the Linux kernel is hard and takes a long time, one way or another.
(It also makes a good illustration of why the Linux kernel developers generally insist that outside code be converted over to use kernel idioms before they'll accept it into the kernel tree.)