Wandering Thoughts archives

2009-11-25

Why I love Unix, number N (for some N)

Suppose that you want to find all two and three digit primes where all of the component digits are also prime (eg, 37 would qualify but 41 would not, since 4 is not prime).

Here is the simple Unix approach:

factor $(seq 10 999) | awk 'NF == 2 {print $2}' | egrep '^[12357]*$'

(I won't claim that this is obvious until you're immersed in the Unix pipe-bashing mindset, at which point it becomes all too easy to get sucked into the Turing tar pit.)

On a side note, seq is one of those programs that get more and more useful the more often I use it. It's not at all interesting on its own, but it really comes into its real power when used as an ingredient in shell script iteration.

And, oh yeah, it's a GNU tool. Another way that they've contributed to Unix in the Unix way.

(Okay, credit where credit is due; I believe that seq first showed up in Plan 9. But I will point out that the GNU tools people are the only people smart enough to reuse the idea.)

Update: Oops. As pointed out by a commentator, 1 is not a prime. This shows one drawback of these neat one-line things; they're so short and simple that you may be tempted to not double check your work. (This is especially embarrassing for me because I looked at the output of 'factor $(seq 1 9)' to make sure that I had the right set of single-digit primes, but discounted what factor printed for 1 without looking into it further.)

UnixLove written at 01:29:24; Add Comment

2009-11-05

Why the NFS client is at fault in the multi-filesystem NFS problem

In yesterday's entry, I said that the NFS clients were at fault in dealing with the duplicate inode number problem. Now it's time for the details, because on first look this appears a bit odd; how can it be the client's responsibility to avoid duplicate inode numbers, when the server gives it the inode numbers?

In the NFS v3 specification, inode numbers only appear in one spot; they're part of the file attribute structure that the server returns for GETATTR requests. While it is used for more than just stat(), GETATTR is the NFS analog of the stat() system call and the fattr3 structure that it returns is the analog of the kernel's struct stat that stat() fills in, and much the same information appears in both.

In particular, the fattr3 structure has both a fileid (the inode number) and a fsid, the 'file system identifier for [the file's] file system'. While NFS v3 requires that the inode number to be unique it only requires that it be unique within a single server filesystem, that is, for files with the same fsid. And an NFS server is free to give you files with different fsids even though you have only made one NFS mount from it, of what you think is a single filesystem.

The simple way for clients to map between GETATTR and stat() is to turn the fileid into the inode number, fill in st_dev based on some magic internal number you're using for this NFS mount, and throw away the fsid. A kernel that does this has the duplicate inode number problem.

Unfortunately, fixing this is complicated. The NFS client cannot simply use the fsid for st_dev, because st_dev must be unique on the local machine and the fsid comes from the server; thus, it can potentially collide both with local filesystems and with filesystems from other NFS servers. Using fsid at all in the stat() results requires somehow inventing a relatively persistent and unique st_dev value for every different fsid that every NFS server gives you, which is non-trivial.

(If you have a very big st_dev you can deal with the problem by mangling the fsid together with a unique local number for this NFS mount. But fsid is a 64-bit number, so you'd need a pretty epic st_dev.)

Sidebar: the Linux solution to this problem

The Linux NFS client has a creative solution to this problem: it actually creates new NFS-mounted filesystems on the fly, complete with new local st_dev values, every time you traverse through a point where the fsid changes. Comments in the source code say that this has the side effect of making df work correctly, at least as long as you are not dealing with something like ZFS.

NFSMultiFSFault written at 00:02:51; Add Comment

2009-11-04

The cause of the multi-filesystem NFS export problem

There is a famous irritation with managing NFS filesystems which boils down to that NFS clients have to know about your filesystem boundaries. It goes like this: suppose that /home and /home/group1 are separate filesystems and you NFS export both of them. What you would like is that clients NFS mount /home and automatically get /home/group1 too, because this lets you transparently add /home/group2 next month. However, this doesn't work (although some systems will try hard to fake it if you tell them to).

(This issue is a lot more pertinent these days in light of things like ZFS, where filesystems are cheap objects.)

Although it superficially looks like the NFS re-export problem, the problem here isn't telling NFS filehandles for the different real filesystems apart. Provided that all of the filesystems can be NFS exported normally, your NFS server can just give out the same filehandles it would if the client had explicitly mounted the filesystems separately (the filehandle is opaque to the client, after all).

The real problem is what common NFS clients expect about the inode numbers; specifically, they expect the inode number to be unique in the client's view of the filesystem, and from the client's view it only mounted one filesystem. Meanwhile, on the server there are multiple filesystems and their inode numbers are almost certain to overlap. The result is explosions in some programs on the client under some circumstances, as the programs see duplicate inode numbers for files that are not actually hardlinks to each other.

(The client kernels generally don't care; the inode numbers that user programs see are unrelated to the NFS filehandles that the kernel uses.)

Technically this is a client side problem, but I doubt that any NFS client implementation actually gets it right. (And it is very hard to get right, since the client has to somehow make up unique yet ideally persistent inode numbers.)

(This is the kind of thing that I write down in part so that I can remember the logic the next time I wonder about it.)

Sidebar: the more subtle failures

Okay, that's not quite all that goes wrong if the server lets NFS clients transparently cross filesystem boundaries, because there are various operations that don't work across server filesystem boundaries despite looking like they should on the client. For example, if /home on the client is all one single NFS mount, a program is rationally entitled to believe that it can hardlink /home/fred/a to /home/group1/jim/b. In practice this is going to fail with an error because on the server that's a cross-filesystem hardlink.

MultiFilesystemNFSIssue written at 00:48:11; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.