2019-04-24
How we're making updated versions of a file rapidly visible on our Linux NFS clients
Part of our automounter replacement is a file with a master list of all NFS mounts that client machines should have, which we hold in our central administrative filesystem that all clients NFS mount. When we migrate filesystems from our old fileservers to our new fileservers, one of the steps is to regenerate this list with the old filesystem mount not present, then run a mount update on all of the NFS clients to actually unmount the filesystem from the old fileserver. For a long time, we almost always had to wait a bit of time before all of the NFS clients would reliably see the new version of the NFS mounts file, which had the unfortunate effect of slowing down filesystem migrations.
(The NFS mount list is regenerated on the NFS fileserver for our central administrative filesystem, so the update is definitely known to the server once it's finished. Any propagation delays are purely on the side of the NFS clients, who are holding on to some sort of cached information.)
In the past, I've made a couple of attempts to find a way to reliably
get the NFS clients to see that there was a new version of the file
by doing things like flock(1)
'ing it before
reading it. These all failed. Recently, one of my co-workers
discovered a reliable way of making this work, which was to regenerate
the NFS mount list twice instead of once. You didn't have to delay
between the two regenerations; running them back to back was fine.
At first this struck me as pretty mysterious, but then I came up
with a theory for what's probably going on and why this makes sense.
You see, we update this file in a NFS-safe way that leaves the old version
of the file around under a different name so that programs on NFS
clients that are reading it at the time don't have it yanked out
from underneath them.
As I understand it, Linux NFS clients cache the mapping from
filesystem names to NFS filehandles
for some amount of time, to reduce various sorts of NFS lookup
traffic (now that I look, there is a discussion pointing to this
in the nfs(5)
manpage). When we do one
regeneration of our nfs-mounts
file, the cached filehandle that
clients have for that name mapping is still valid (and the file's
attributes are basically unchanged); it's just that it's for the
file that is now nfs-mounts.bak
instead of the new file that is
now nfs-mounts
. Client kernels are apparently still perfectly
happy to use it, and so they read and use the old NFS mount
information. However, when we regenerate the file twice, this file
is removed outright and the cached filehandle is no longer valid.
My theory and assumption is that modern Linux kernels detect this
situation and trigger some kind of revalidation that winds up with
them looking up and using the correct nfs-mounts
file (instead of,
say, failing with an error).
(It feels ironic that apparently the way to make this work for us here in our NFS environment is to effectively update the file in an NFS-unsafe way for once.)
PS: All of our NFS clients here are using either Ubuntu 16.04 or 18.04, using their stock (non-HWE) kernels, so various versions of what Ubuntu calls '4.4.0' (16.04) and '4.15.0' (18.04). Your mileage may vary on different kernels and in different Linux environments.
The appeal of using plain HTML pages
Once upon a time our local support site was a wiki, for all of the reasons that people make support sites and other things into wikis. Then using a wiki blew up in our faces. You might reasonably expect that we replaced it with a more modern CMS, or perhaps a static site generator of some sort (using either HTML or Markdown for content and some suitable theme for uniform styling). After all, it's a number of interlinked pages that need a consistent style and consistent navigation, which is theoretically a natural fit for any of those.
In practice, we did none of those; instead, our current support
site is that most basic thing, a
bunch of static .html
files sitting in a filesystem (and a static
file of CSS, and some Javascript to drop in a header on all pages).
When we need to, we edit the files with vi
, and there's no
deployment or rebuild process.
(If we don't want to edit the live version, we make a copy of the
.html
file to a scratch name and edit the copy, then move it back
into place when done.)
This isn't a solution that works for everyone. But for us at our modest scale, it's been really very simple to work with. We all already know how to edit files and how to write basic HTML, so there's been nothing to learn or to remember about managing or updating the support site (well, you have to remember where its files are, but that's pretty straightforward). Static HTML files require no maintenance to keep a wiki or a CMS or a generator program going; they just sit there until you change them again. And everything can handle them.
I'm normally someone who's attracted to ideas like writing in a
markup language instead of raw HTML and having some kind of templated,
dynamic system (whether it's a wiki, a CMS, or a themed static site
generator), as you can tell from Wandering Thoughts and
DWiki itself. I still think that they make sense at large scale.
But at small scale, if I was doing a handful of HTML pages today,
it would be quite tempting to skip all of the complexity and just
write plain .html
files.
(I'd use a standard HTML layout and structure for all the .html
files, with CSS to match.)
(This thought is sort of sparked by a question by Pete Zaitcev over on the Fediverse, and then reflecting on our experiences maintaining our support site since we converted it to HTML. In practice I'm probably more likely to update the site now than I was when it was a wiki.)