DKMS kind of has a problem with its error messages
I generally like DKMS, I really do. Almost all of the time it provides a friction-free way for me to have ZFS on Linux in the face of both kernel upgrades (including upgrading Fedora distributions) and ZoL updates (which happen reasonably frequently, because I generally track the development version). But once in a while things don't go so well, and when this happens the error messages that DKMS spits out are not so helpful at identifying the cause (and how to fix things).
The latest issue is that recently, any time I upgraded ZoL,
the upgrade of the
zfs-dkms RPM would spit out an error like
[...] Upgrading : zfs-dkms-0.7.0-rc1_144_g24cdeaf.fc24.noarch Removing old zfs-0.7.0 DKMS files... -------- Uninstall Beginning -------- [...] DKMS: uninstall completed. Loading new zfs-0.7.0 DKMS files... Error! DKMS tree already contains: zfs-0.7.0 You cannot add the same module/version combo more than once. warning: %post(zfs-dkms-0.7.0-rc1_144_g24cdeaf.fc24.noarch) scriptlet failed, exit status 3 Non-fatal POSTIN scriptlet failure in rpm package zfs-dkms Non-fatal POSTIN scriptlet failure in rpm package zfs-dkms [...]
So, this error message creates several obvious questions here:
- Where is the DKMS tree that this is talking about?
/var/lib/dkms, but DKMS doesn't tell you and you'll have to search the manpage and so on fairly carefully to find out or guess.
- What specifically makes DKMS think that the tree already contains
The answer appears to be that
/var/lib/dkms/zfs/0.7.0/sourceexists as either a symlink or a directory. Here it's normally a symlink to
/usr/src/zfs-0.7.0(at least that what it is when things are stable, after I fixed DKMS's view of the world; I don't know what it was at the time of the error because DKMS didn't tell me). At least I think that's what it is.
You find out this answer by carefully reading bits of the large Bash shell script that is
/usr/sbin/dkms, which is is not something most people will do to troubleshoot problems.
Naturally this leads to the next question, which is why there was
/var/lib/dkms/zfs/0.7.0/source symlink after DKMS uninstall
had finished. A hint to the answer is lurking in output that is not
actually in this transcript, which should come right after the
DKMS: uninstall completed.' line:
------------------------------ Deleting module version: 0.7.0 completely from the DKMS tree. ------------------------------
The lack of this message means 'for some reason, there were still
/var/lib/dkms/zfs/0.7.0/ after DKMS
theoretically uninstalled everything, so DKMS didn't recursively
delete all of
/var/lib/dkms/zfs/0.7.0'. Or at least, there were
directories that didn't match the egrep pattern
(build|tarball|driver_disk|rpm|deb|source)$, because DKMS
explicitly excludes anything ending in those for this check. Again,
you find this out from the DKMS shell script.
(DKMS could perhaps warn you about this situation, and maybe even report what things are lingering. Maybe you don't want this all of the time, but DKMS is being run here in a special 'during RPM upgrades' mode and it can probably work out that something is bad here.)
In my case, this was a directory hierarchy called
4.7.4-200.fc24.x86_64, which was apparently a lingering remnant
of a much older version of the
zfs-dkms package on a much older
kernel. It's not clear how or why it wound up left there, floating
around and contaminating things, but having gone through this entire
tracing exercise I think that I'm reasonably confident that it was
causing my problems and removing it was the right answer.
(I'll only find out for sure the next time I upgrade my ZoL packages, which probably won't be for at least a day or two. I don't like to reboot my office workstation all the time, after all; it's nice to have a few days of stability.)
PS: I don't expect the DKMS script to ever get better errors. Why not? Well, it's a 3,882 line Bash shell script (on Fedora 24). Do you want to touch that if you don't absolutely have to? Yeah, that's what I thought. Probably the only way for it to get significantly better is to get completely rewritten in something other than Bash, and that would be a mistake because those almost 4,000 lines undoubtedly contain a lot of painful and hard-won experience about what works and what sort of terrible and peculiar things happen out in the world of people using DKMS. As usual, a from-scratch rewrite would get to rediscover all of those problems and so on for itself.