My own configuration files don't have to be dotfiles in
Back when I started with Unix (a long time ago), programs had a
simple approach to where to look for or put little files that they
needed; they went into your
$HOME as dotfiles, or if the program
was going to have a bunch of them it might create a dot-directory
for itself. This started with shells (eg
spread steadily from there, especially for early open source programs.
When I started writing shell scripts, setup scripts for my X
environment, and other bits and pieces that needed configuration
files or state files, the natural, automatic thing to do was to
imitate this and put my own dotfiles and dot-directories in my
$HOME. The entirely unsurprising outcome of this is that my home
directories have a lot of dotfiles (some of them very old, which
can cause problems). How many is a lot? Well,
in my oldest actively used
$HOME, I have 380 of them.
(Because dotfiles are normally invisible, it's really easy for them
to build up and build up to absurd levels. Not that my
neat in general, but I have many fewer non-dotfiles cluttering it
Recently it slowly dawned on me that my automatic reflex to put
$HOME as dotfiles is both not necessary and not really
a good idea. It's not necessary because I can make my own code look
wherever I want it to, and it's not a good idea because
dotfiles are a jumbled mess where it's very hard to keep track of
things or even to see them. Instead I'm better off if I put my own
files in non-dotfile directory hierarchies somewhere else, with
sensible names and sensible separation into different subdirectories
and all of that.
(I'm not quite sure when and why this started to crystalize for me, but it might have been when I was revising my X resources and X setup stuff on my laptop and realized that there was no particular reason to put them in _$HOME/.X<something> the way I had on my regular machines.)
I'm probably not going to rip apart my current
$HOME and its
collection of dotfiles. Although the idea of a scorched earth
campaign is vaguely attractive, it'd be a lot of hassle for no
visible change. Instead, I've decided that any time I need to make
any substantial change to things that are currently dotfiles, I'll
take the opportunity to move them out of
(The first thing I did this with was my X resources,
which had to change on my home machine due to a new and rather
different monitor. Since I was basically
gutting them to start with, I decided it made no sense to do it in
PS: Modern Unix (mostly Linux) has the XDG Base Directory
which tries to move a lot of things under
$HOME/.cache. In theory I could move
my own things under there too. In practice I'm not particularly
interested in hiding them away that way; I'd rather put them somewhere
more obvious, such as
Being reminded that an obvious problem isn't necessarily obvious
The other day we had a problem with one of our NFS fileservers, where a ZFS filesystem filled up to its quota limit, people kept writing to the filesystem at high volume, and the fileserver got unhappy. This nice neat description hides the fact that it took me some time to notice that the one filesystem that our DTrace scripts were pointing to as having all of the slow NFS IO was a full filesystem. Then and only then did the penny finally start dropping (which led me to a temporary fix).
(I should note that we had Amanda backups and a ZFS pool scrub happening on the fileserver at the time, so there were a number of ways it could have been overwhelmed.)
In the immediate aftermath, I felt a bit silly for missing such an obvious issue. I'm pretty sure we've seen the 'full filesystem plus ongoing writes leads to problems' issue, and we've certainly seen similar problems with full pools. In fact four years ago I wrote an entry about remembering to check for this sort of stuff in a crisis. Then I thought about it more and kicked myself for hindsight bias.
The reality of sysadmin life is that in many situations, there are too many obvious problem causes to keep track of them all. We will remember common 'obvious' things, by which I mean things that keep happening to us. But fallible humans with limited memories simply can't keep track of infrequent things that are merely easy to spot if you remember where to look. These things are 'obvious' in a technical sense, but they are not in a practical sense.
This is one reason why having a pre-written list of things to check is so potentially useful; it effectively remembers all of these obvious problem causes for you. You could just write them all down by themselves, but generally you might as well start by describing what to check and only then say 'if this check is positive ...'. You can also turn these checks (or some of them) into a script that you run and that reports anything it finds, or create a dashboard in your monitoring and alert system. There are lots of options.
(Will we try to create such a checklist or diagnosis script? Probably not for our current fileservers, since they're getting replaced with a completely different OS in hopefully not too much time. Instead we'll just hope that we don't have more problems over their remaining lifetime, and probably I'll remember to check for full filesystems if this happens again in the near future.)
Sidebar: Why our (limited) alerting system didn't tell us anything
The simple version is that our system can't alert us only on the combination of a full filesystem, NFS problems with that fileserver, and perhaps an observed high write volume to it. Instead the best it can do is alert us on full filesystems alone, and that happens too often to be useful (especially since it's not something we can do anything about).