How we're going to be doing custom NFS mount authorization on Linux
We have a long standing system of custom NFS mount authorization on our current OmniOS-based fileservers. This system has been working reliably for years, but our next generation of fileservers will use a different OS, almost certainly Linux, and our current approach doesn't work on Linux, so we had to develop a new one.
One of the big attributes of our current system is that it doesn't require the clients to do anything special; they do NFS mount requests or NFS activity, and provided that their SSH daemon is running, they get automatically checked and authorized. This is important to making the system completely reliable, which is very important if we're going to use it for our own machines (which are absolutely dependent on NFS working). However, the goals of our NFS authorization have shifted so that we no longer require this for our own machines. In light of that, we decided to adopt a more straightforward approach on Linux, one that requires client machines to explicitly do a manual step on boot before they could get NFS access.
The overall 'authorization' system works via firewall rules, where
only machines in a particular ipset table
can talk to the NFS ports on the fileserver. Control over actual
NFS mounts and NFS level access is still done through
and so on, but you have to be in the ipset table in order to even
get that far. To get authorized, ie to get added to the ipset table,
your client machine makes a connection to a specific TCP port on
the fileserver. This ends up causing a Go program to make a
connection to the SSH server on the client machine and verify its
host key against a
known_hosts file that we maintain; if the key verifies, we add
the client's IP address to the ipset table, and if it fails to
verify, we explicitly remove the client's IP address from the table.
(This connection can be done as simply as '
nc FILESERVER PORT
</dev/null >/dev/null'. In practice clients may want to record the
output from the port, because we spit out status messages, including
potentially important ones about why a machine failed verification.
We syslog them too, but those syslog logs aren't accessible to other
This Go program can actually check and handle multiple IP addresses at once (doing so in parallel). In this mode, it runs from cron every few minutes to re-verify all of the currently authorized hosts. The program is sufficiently fast that it can complete this full re-verification in under a second (and with negligible resource usage); in practice, the speed limit is how long of a timeout we use to wait for machines to respond.
To handle fileserver reboots, verified IPs are persistently recorded by touching a file (with the name of their IP address) in a magic directory. On boot and on re-verification, we merge all of the IPs from this directory with the IPs from the ipset table and verify them all. Any IPs that pass verification but aren't in the ipset table are added back to the table (and any IPs in the ipset table but not recorded on disk are persisted to disk), which means that on boot all IPs will be re-added to the ipset table without the client having to do anything.
Clients theoretically don't have to do anything once they've booted and been authorized, but because things can always go wrong we're going to recommend that they re-poke the magic TCP port every so often from cron, perhaps every five or ten minutes. That will insure that any NFS outage should have a limited duration and thus hopefully a limited impact.
(In theory the parallel Go checker is so fast that we could just
extract all of the client IPs from our
known_hosts and always
try to verify them, say, once every fifteen minutes. In practice I
think we're unlikely to do this because there are various potential
issues and it's probably unlikely to help much in practice.)
We're probably going to provide people with a little Python program
that automatically does the client side of the verification for all
current NFS mounts and all mounts in
/etc/fstab, and then logs
the results and so on. This seems more friendly than asking all of
the people involved to write
their own set of scripts or commands for this.
PS: Our own machines on trusted subnets are handled by just having a blanket allow rule in the firewall for those subnets. You only have to be in the ipset table if you're not on one of those subnets.