2011-02-13
A humbling experience with '#' characters in filenames
It's always a humbling experience to realize that you've made a terrible mistake when configuring a program, even when it only really affects you.
Once upon a time, I was building and thus configuring my first MH setup. MH puts each message in its own file and normally 'removes' messages by renaming them by putting a prefix character on the front; this lets you relatively easily un-delete messages for some amount of time if you made a mistake in removing them. When you are configuring MH, you can chose what this prefix character is.
For some reason (perhaps because it was mentioned as an option in the
configuration documentation), I picked the prefix character '#'.
This seemed to work well enough, and so I carried this particular MH
configuration choice forward to every version of MH I configured for
the next, oh, fifteen years. Then in the fall of 2006 I moved to an
environment where for the first time in many years I was not using
a version of MH that I had compiled myself; instead I was using a
prepacked one. This version used the normal MH default character
of ','.
Let me assure you that it is much easier to work with files that start
with a ',' than it is to work with files that start with a '#'.
Until my MH environment switched, I had not really been conscious of
how subtly annoying the old way was, but it turned out that it was. As
usual, it wasn't an issue of it being impossible or very hard to work
with filenames with a '#'; it's that such filenames added friction to
the process, enough friction that I avoid dealing with them. Friction
matters more than we think it does.
Fortunately my bad configuration decision a very long time ago didn't affect very many people. In all of the environments I've worked in, only a few people ever used MH, and in several of them I was basically the sole user.
2011-02-04
Why people put NFS mounts in subdirectories
One of the little pieces of Unix wisdom is that you should put NFS
mounts (well, their mount points) in their own subdirectory. You don't
mount NFS filesystems directly in /, you don't mount them in a
directory with local subdirectories that you care about, and ideally you
don't mix filesystems from different servers in the same subdirectory.
(In other words, an ideal mount point is, say, '/nfs/<server>/<fs>'.)
What is behind this is a combination of Unix directory traversal and
that if you stat() or otherwise attempt to touch a NFS mount point
from a server that isn't responding, your program hangs. In a classical
Unix system a surprising number of programs walk directories and
stat() at least some of what they find, even programs you might not
think of like pwd. Some of them walk up the filesystem hierarchy,
or at least wind up looking at the root directory.
(Even if an NFS server is responding it might be rather slow.)
It's unavoidable that programs that really want to deal with filesystems
from an unavailable NFS server will have problems. But we would like
unrelated processes to not be hampered by a hung NFS server; if your
process or session doesn't care about the unavailable filesystems
and would be unaffected if they weren't mounted at all, it shouldn't
hang. Which means that any directory traversal that you do needs to
be kept away from such NFS mounts, so that you don't wind up stalling
yourself because you stat()'ed a directory entry for an NFS mount that
you don't care about.
Segregating NFS mount points from regular directories and then further segregating them by their server minimizes the chances that you'll trip over an unrelated NFS filesystem during this sort of directory traversal.
(And putting NFS mounts directly in / means that any program that
looks at the root directory and stat()'s things in it might hang or be
delayed due to any of the NFS servers having problems.)
As a pragmatic matter, some of this is no longer applicable on many modern Unix systems. So this is probably on its way to sliding into a Unix superstition (or at least a sysadmin one).
Sidebar: how classic pwd works
The classic version of pwd has a simple algorithm:
- stat
.and remember its identity - read through
.., stat'ing every entry until we find the one for.; we now know the name of the current directory - go up one directory and repeat the process
- stop when we hit a directory where
..points to itself, because that means we've hit the root directory and we should be done
I call this pwd's algorithm, but it also appeared as getcwd() and
was used by anything that needed to know the current directory, such
as 'df .'.
Modern systems make getcwd() into a system call because they can; they
keep enough extra information in kernel memory to return the information
immediately.