Wandering Thoughts archives

2009-08-28

ZFS changes filesystem device numbers on reboot

Here is an unpleasant discovery that we just finished making: in at least Solaris 10 U6, ZFS changes the device numbers of all of your filesystems when your system reboots. More specifically (and even worse), the device numbers of a pool's filesystems change whenever the pool is imported, including the implicit import that happens when a system boots up.

(This raises interesting questions of how NFS filehandles keep working across system reboots, which they do. Presumably they have some other stable identifier for a given ZFS filesystem that still fits in the NFS filehandle size limits.)

So, you might ask, what difference does this make? Unfortunately for ZFS (and people using it), some programs still count on device numbers being stable so that they can use the device number plus inode as a long term identifier for a particular file. One such program is GNU tar, which uses this assumption for its incremental backups. If the assumption is violated, GNU tar's incremental backups turn into complete 'level 0' backups, since they think that everything is new because no device numbers match.

GNU tar, as it happens, is at the heart of our backup system. (Not that you have much choice with ZFS, since it has no dump equivalent and no, zfs send doesn't count.)

So, if you have ZFS machines and a backup system that might possibly assume that device numbers are stable, be careful. If you haven't tested rebooting a machine with enough backed-up data that you can tell if you're suddenly doing a full backup instead of an incremental one, well, you might want to plan such a test.

(For GNU tar specifically it is possible to fix the situation by directly modifying GNU tar's index files, which are fortunately stored in plain text format. How to do so doesn't fit within the margins of this entry.)

ZFSVariableDeviceNumbers written at 00:39:10; Add Comment

2009-08-02

Limitations on custom NFS mount authorization on Solaris

As it turns out, there are some limitations with our custom NFS mount authorization hack that we've discovered. First, Solaris 10 puts some security restrictions on what mountd can do; the specific bit that we ran into is it can't fork() and exec() shell scripts. Mountd can make outgoing socket connections, so we got around this by putting most of the complex work in an inetd-spawned 'daemon'. (There is probably a way to turn this security feature off, but we didn't bother trying to find it.)

Second, things won't stall endlessly waiting for an answer to an NFS mount authorization. I don't know how long the various timeouts are, but there is definitely a limit on how much time your authorization code can use. This probably won't be a real limit in most environments, since the timeout seems to be at least several seconds, but bear it in mind.

We've also seen indications that mountd will only allow a certain (low) number of outstanding mount requests; my current guess is 20. Once it hits the limit, further requests are either dropped or get permission denied errors (it's not clear which, and we've lacked the time to debug things in detail when the issue has come up). This is also going to be an issue if your authorization code can take some time.

Finally, it seems that under some circumstances some part of the overall system will cache the answers to mount requests, with the effect that that an authorization decision can be 'sticky' for some amount of time even if re-running the innetgr() query from scratch would get a different answer. It's possible that this is mostly for permission denied answer, as spurious mount denials are when we notice this.

(Thus, this is probably not the right hack for you if your code needs to take a shot at absolutely every mount request mountd handles, no matter what.)

All of this is inevitable given that the entire thing is a hack; one can't expect the rest of the system to cooperate perfectly, the way one could if this was a real feature. It follows that the more your code deviates from normal innetgr() behavior, the more possibility there is for something to go wrong. If you need a seriously different NFS mount authorization scheme, you may have to hack the OpenSolaris mountd code base directly (and hope that it works okay on regular Solaris 10, which it may not).

CustomMountAuthLimits written at 23:33:02; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.