2011-02-07
My brute force email archive
Years ago, I had a brainwave about archiving my email. The brainwave
was simple: 'disk is cheap'. So I changed my .forward
to save a copy
of all of my email to a file, in addition to the other filtering I was
doing with it. I don't point a mail client at the file or otherwise use
it for anything in my regular email setup; it is purely a backup and
completely separate from my regular email client.
(To be honest, it might have evolved out of my careful caution when I
started using procmail. Since I didn't entirely trust procmail, I think
that I set up my .forward
to save a backup copy of all of my email in
a file. At some point I then realized that disk space was cheap and
didn't actually clear the file, just let it accumulate.)
Recently I realized that it needed one more thing to be really complete and useful; it needed to get a copy of my outgoing email, not just my incoming email. Thus over the past few years I've switched to cc'ing myself on everything I write (generally done automatically by my MUA, replacing saving the messages itself).
There are two important attributes of this brute force archive that make it so useful. First, it is truly comprehensive; it has everything, not just the things that I thought I was going to want (or need) later. I wouldn't say that I'm bad at picking what I'll need later, but I'm not completely accurate at it. Having a complete archive as a backup means that I don't have to be; my accuracy is more a matter of convenience than of necessity.
Second, it's separate from my regular mail environment so that my full archive doesn't clutter up (and slow down) my normal mail folders. This matters because how I want to use my regular folders is very different from how I use a comprehensive archive. If I tried to use only a comprehensive archive, I would immediately start losing important things in it; there would be so much volume and even so many false positives in searches that it would be pretty much useless. I need my regular folders to be curated and sorted (and, sometimes, pruned), to contain the things that I think matter and that I want to be paying attention to. This is nothing like a comprehensive archive.
In theory I could do all of this within a single mail environment. I would just have to be very disciplined about always saving a copy of every message (both received and sent) to my special 'archive' set of folders, no matter how trivial the message was, and then also having it in my regular set of folders as I processed it and perhaps saved it again.
In practice, having the system handle it all by just writing everything to a file is simpler and more reliable (and it's grunt work; computers exist to automatic grunt work). Also, since this file exists entirely outside of any MUA I may use, I know for sure that no MUA is touching it and doing things to the messages; they are archival perfect, exactly as originally received (and exactly in the order they were originally received).
PS: when I say everything I really do mean everything. So yes, this does
mean that I get kind of irritated at people who email us ten megabyte
files out of the blue (which happens every so often). But disk space
really is cheap, and if I need to I can always bzip2
the archives.