2009-07-30
How we do custom NFS mount authorization on Solaris 10
Suppose that you want to use some custom method of authenticating and
authorizing NFS mounts, and that your fileservers are Solaris 10 systems
(although the same general approach could probably work elsewhere).
Further suppose that you don't have the source to mountd, or at least
don't want to modify it for various sensible reasons. Fortunately,
there is an evil hack that you can commit that will let you do whatever
authorization checks that you need.
(When thinking about all of this, remember that NFS mount security has limits.)
Solaris, like most systems, will let you export filesystem to (NIS)
netgroups. Solaris also has an /etc/nsswitch.conf file to specify how
netgroups (among other things) are looked up (in fact it originated the
idea). And finally, one of the little-used features of nsswitch.conf is
that you can write your own library to be a new lookup service (that you
can then use in nsswitch.conf).
So, the evil hack is to hijack mountd's netgroup lookups to do your own
authorization by having your own custom library set as the service for
netgroups in nsswitch.conf. When you export a filesystem to a netgroup
and a client tries to mount the filesystem, mountd will wind up
calling innetgr() to see if the machine is in the netgroup, which will
wind up calling a function in your lookup service library, and this
function can use whatever mechanism you want to decide whether to say
yes or no.
(Essentially what you're doing is hijacking 'netgroups' to pass magic
tokens through mountd to your authorization library. Note that your
library will get the same information that innetgr() does, which gives
it both the client host and the netgroup name.)
One drawback of this approach is that your authorization library must
perform all of the authorization checking, because you can't tell
mountd to export something to host X but only if it's also in netgroup
Y.
We use this here on our Solaris 10 fileservers,
and it works fine (with some caveats that don't fit in this
entry). The basic idea can probably be applied on any Unix with an
/etc/nsswitch.conf and enough documentation on it to let you write
new lookup services.
(The usual disclaimer: I didn't come up with this, I'm just writing it up.)
Sidebar: ways to use this
There are two ways that I can think of for using this:
- for extended host authentication;
you have some mechanism to have netgroup-like lists of machines,
but you do additional verification that the IP address actually
is the real host instead of an imposter.
You'll have groups of machines in 'netgroups', and export things
to various different netgroups, and it will all look quite normal
to the casual observer.
- to have a completely new authorization system. Here the only thing the netgroup names are used for is so that your own authorization system knows what it's being asked to authorize.
Our use of this is the first sort, for extended host authentication.
2009-07-10
An unpleasant surprise about ZFS scrubbing in Solaris 10 U6
Here is something that we discovered recently: ZFS will refuse to scrub a pool that is DEGRADED, even if the degraded state is harmless to actual pool redundancy and there is no resilvering going on. In the usual ZFS manner, it doesn't give you any actual errors, it just doesn't do anything when you ask for a pool scrub.
(Now, I can't be completely and utterly sure that it was the DEGRADED
state that blocked the scrub and not coincidence or something
unrelated. But I do know that the moment we zpool detach'd the faulted
device, restoring its vdev and thus the entire pool to the normal ONLINE
state, we could start a scrub that did something.)
Regardless of what exactly is causing this, this behavior is bad (and a number of other words). When your pool is experiencing problems is exactly when you most want to scrub it, so you have the best information possible about how bad the problem is (and where it is) and you don't take actions in haste that actually make your problems worse.
I don't know what you could do if you couldn't detach the device. It's
possible that ZFS somehow thought that the pool was still resilvering
and thus that either exporting and importing the pool or rebooting
the server would have fixed the problem (both of these often reset
information about scrubs in zpool status output).
(Neither was an option on a production fileserver, so we didn't try them; this is pure speculation.)
Sidebar: exactly what happened
Last week, one side of one mirror in a pool on one of our ZFS fileservers started reporting read errors (and the iSCSI backend started reporting drive errors to go with it). Since we were shorthanded due to vacations, we opted to not immediately replace the disk; instead we added a third device to that vdev to make it a three-way mirror, so that we would still have a two-mirror even if the disk failed completely. That evening, the disk started throwing up enough read errors that ZFS declared it degraded and pulled in one of the configured spares for that pool and reconstructed the mirror, exactly as it was supposed to.
However, this put that vdev and thus the entire pool into a DEGRADED
state, which is sort of reasonably logical from the right perspective
(the pool is still fully redundant, but it has degraded from the
configuration you set up). And, as mentioned, we couldn't scrub the
pool; attempts to do so didn't do anything except change the time the
'add-the-spare' resilver nominally completed at to the current time in
the output of zpool status.