Workarounds are often forever (unless you work to make them otherwise)
Back in 2018, ZFS on Linux had a bug that could panic the system if you NFS-exported ZFS snapshots. We were setting up ZFS based NFS fileservers and we knew about this bug, so at the time we set things so that only filesystems themselves were NFS exported and available on our servers. Any ZFS snapshots on filesystems were only visible if you directly logged in to the fileservers, which was (and is) something that only core system staff could do. This is somewhat inconvenient; we have to get involved any time people want to get stuff back from snapshots.
It is now 2024. ZFS on Linux became OpenZFS (in 2020) and has long since fixed that issue and released versions with the fix. If I'm retracing Git logs correctly, the fix was in 0.8.0, so it was included (among many others) in Ubuntu 22.04's ZFS 2.1.5 (what our fileservers are currently running) and Ubuntu 24.04's ZFS 2.2.2 (what our new fileservers will run).
When we upgraded the fileservers from 18.04 to 22.04, did we go back to change our special system for generating NFS export entries to allow NFS clients to access ZFS snapshots? You already know the answer to that. We did not, because we had completely forgotten about it. Nor did we go back to do it as we were preparing the 24.04 setup of our ZFS fileservers. It was only today that it came up, as we were dealing with restoring a file from those ZFS snapshots. Since it's come up, we're probably going to test the change and then do it for our future 24.04 fileservers, since it will make things a bit more convenient for some people.
(The good news is that I left comments to myself in one program about why we weren't using the relevant NFS export option, so I could tell for sure that it was this long since fixed bug that had caused us to leave it out.)
It's a trite observation that there's nothing so permanent as a temporary solution, but just because it's trite doesn't mean that it's wrong. A temporary workaround that code comments say we thought we might revert later in the life of our 18.04 fileservers has lasted about six years, despite being unnecessary since no later than when our fileservers moved to Ubuntu 22.04 (admittedly, this wasn't all that long ago).
One moral I take from this is that if I want us to ever remove a 'temporary' workaround, I need to somehow explicitly schedule us reconsidering the workaround. If we don't explicitly schedule things, we probably won't remember (unless it's something sufficiently painful that it keeps poking us until we can get rid of it). The purpose of the schedule isn't necessarily to make us do the thing, it's to remind us that the thing exists and maybe it shouldn't.
(As a corollary, the schedule entry should include pointers to a lot of detail, because when it goes off in a year or two we won't really remember what it's talking about. That's why we have to schedule a reminder.)
|
|