Fixing Alpine to work over NFS on Ubuntu 18.04 (and probably other modern Linuxes)
Last September we discovered that the Ubuntu 18.04 LTS version of Alpine was badly broken when used over NFS, and eventually traced this to a general issue with NFS in Ubuntu 18.04's kernel and probably all modern Linux kernels. Initially we thought that this was a bug in the Linux NFS client, but after discussions on the Linux NFS mailing list it appears that this is a feature, although I was unable to get clarity on what NFS client behavior is guaranteed in general. To cut a long story short, in the end we were able to find a way to change Alpine to fix our problem, at least on Ubuntu 18.04's normal 4.15.x based server kernel.
To explain the fix, I'll start with short version of how to reproduce the problem:
- Create a file that is not an exact multiple of 4K in size.
- On a NFS client, open the file read-write and read all the way to the end. Keep the file open.
- On another machine, append to the file.
- In your program on the first NFS client, wait until
stat()
says the file's size has changed. - Try to read the new data. The new data from the end of the old file up to the next 4 KB boundary will be zero bytes.
The general magic fix is to flock()
the file after stat()
says
the file size has changed; in other words, flock()
between steps
four and five. If you do this, the 18.04 kernel NFS client code
magically forgets whatever 'bad' state information it has cached
and (re-)reads the real data from the NFS server. It's possible
that other variations of this sequence might work, such as flock()
'ing
after you've finished reading the file but before it changes, but
we haven't tested them.
(We haven't tested the flock()
behavior on later kernels, either
18.04's HWE kernels
or others, and as mentioned I
could not get the Linux NFS people to say whether or not this is
guaranteed behavior or just a coincidence of the current implementation,
as the non-flock()
version working properly was.)
Even better, Alpine
turns out to already flock()
mailboxes in general. The reason
this is not happening here is that Alpine specifically disables
flock()
on NFS filesystems on Linux (see flocklnx.c)
due to a bug that has now been fixed for more than ten years (really). So
all we need to do to Alpine to fix the whole issue (on kernels where
the flock()
fix works in general) is to take out the check for
being on a NFS filesystem and always flock()
mailboxes regardless
of what filesystem they're on, which as a bonus makes the code
simpler (and avoids a fstatfs()
).
To save people the effort of developing a patch for this themselves, I have added the patch we use for the Ubuntu 18.04 LTS Alpine package to my directory of stuff on this issue; you want cslab-flock.patch. If you build an updated Alpine package, you will want to put a dpkg hold on Alpine after installing your version, because an errant update to a new version of the stock package would create serious problems.
If you're going to use this on something other than Ubuntu 18.04
LTS, you should use my nfsnulls.py test program to
test that the problem exists (and that you can reproduce it) and
to verify that using flock()
fixes it (with the --flock
command
line argument). I would welcome reports on what happens on kernels
more recent than Ubuntu 18.04's 4.15.x.
For reasons beyond the scope of this entry, so far we have not
attempted to report this issue or propagate this change to any of
Ubuntu's official Alpine package, Debian's official Alpine package,
or the upstream Alpine project
and their git repository. I welcome
other, more energetic people doing so. My personal view is that
using flock()
on Linux NFS mounts is the right behavior in Alpine
in general, entirely independent of this bug; Alpine flock()
s in
all other filesystems on Linux, and disabled it on NFS only due to
a very old bug (from the days before flock()
even worked on NFS).
(I'm writing this entry partly because we've received a few queries about this Alpine issue, because other people turn out to have run into the problem too. Somewhat to my surprise, I never explicitly wrote up our solution, so here it is.)
|
|