DKMS kind of has a problem with its error messages

October 31, 2016

I generally like DKMS, I really do. Almost all of the time it provides a friction-free way for me to have ZFS on Linux in the face of both kernel upgrades (including upgrading Fedora distributions) and ZoL updates (which happen reasonably frequently, because I generally track the development version). But once in a while things don't go so well, and when this happens the error messages that DKMS spits out are not so helpful at identifying the cause (and how to fix things).

The latest issue is that recently, any time I upgraded ZoL, the upgrade of the zfs-dkms RPM would spit out an error like this:

[...]
  Upgrading   : zfs-dkms-0.7.0-rc1_144_g24cdeaf.fc24.noarch
Removing old zfs-0.7.0 DKMS files...

-------- Uninstall Beginning --------
[...]
DKMS: uninstall completed.
Loading new zfs-0.7.0 DKMS files...
Error! DKMS tree already contains: zfs-0.7.0
You cannot add the same module/version combo more than once.
warning: %post(zfs-dkms-0.7.0-rc1_144_g24cdeaf.fc24.noarch) scriptlet failed, exit status 3
Non-fatal POSTIN scriptlet failure in rpm package zfs-dkms
Non-fatal POSTIN scriptlet failure in rpm package zfs-dkms
[...]

So, this error message creates several obvious questions here:

  • Where is the DKMS tree that this is talking about?

    Answer: /var/lib/dkms, but DKMS doesn't tell you and you'll have to search the manpage and so on fairly carefully to find out or guess.

  • What specifically makes DKMS think that the tree already contains zfs-0.7.0?

    The answer appears to be that /var/lib/dkms/zfs/0.7.0/source exists as either a symlink or a directory. Here it's normally a symlink to /usr/src/zfs-0.7.0 (at least that what it is when things are stable, after I fixed DKMS's view of the world; I don't know what it was at the time of the error because DKMS didn't tell me). At least I think that's what it is.

    You find out this answer by carefully reading bits of the large Bash shell script that is /usr/sbin/dkms, which is is not something most people will do to troubleshoot problems.

Naturally this leads to the next question, which is why there was still a /var/lib/dkms/zfs/0.7.0/source symlink after DKMS uninstall had finished. A hint to the answer is lurking in output that is not actually in this transcript, which should come right after the 'DKMS: uninstall completed.' line:

------------------------------
Deleting module version: 0.7.0
completely from the DKMS tree.
------------------------------

The lack of this message means 'for some reason, there were still directories in /var/lib/dkms/zfs/0.7.0/ after DKMS theoretically uninstalled everything, so DKMS didn't recursively delete all of /var/lib/dkms/zfs/0.7.0'. Or at least, there were directories that didn't match the egrep pattern (build|tarball|driver_disk|rpm|deb|source)$, because DKMS explicitly excludes anything ending in those for this check. Again, you find this out from the DKMS shell script.

(DKMS could perhaps warn you about this situation, and maybe even report what things are lingering. Maybe you don't want this all of the time, but DKMS is being run here in a special 'during RPM upgrades' mode and it can probably work out that something is bad here.)

In my case, this was a directory hierarchy called 4.7.4-200.fc24.x86_64, which was apparently a lingering remnant of a much older version of the zfs-dkms package on a much older kernel. It's not clear how or why it wound up left there, floating around and contaminating things, but having gone through this entire tracing exercise I think that I'm reasonably confident that it was causing my problems and removing it was the right answer.

(I'll only find out for sure the next time I upgrade my ZoL packages, which probably won't be for at least a day or two. I don't like to reboot my office workstation all the time, after all; it's nice to have a few days of stability.)

PS: I don't expect the DKMS script to ever get better errors. Why not? Well, it's a 3,882 line Bash shell script (on Fedora 24). Do you want to touch that if you don't absolutely have to? Yeah, that's what I thought. Probably the only way for it to get significantly better is to get completely rewritten in something other than Bash, and that would be a mistake because those almost 4,000 lines undoubtedly contain a lot of painful and hard-won experience about what works and what sort of terrible and peculiar things happen out in the world of people using DKMS. As usual, a from-scratch rewrite would get to rediscover all of those problems and so on for itself.

Written on 31 October 2016.
« Link: Linux containers in 500 lines of code
Encouragingly, browsers have not backed down over WoSign »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Oct 31 23:12:51 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.