My brute force email archive

February 7, 2011

Years ago, I had a brainwave about archiving my email. The brainwave was simple: 'disk is cheap'. So I changed my .forward to save a copy of all of my email to a file, in addition to the other filtering I was doing with it. I don't point a mail client at the file or otherwise use it for anything in my regular email setup; it is purely a backup and completely separate from my regular email client.

(To be honest, it might have evolved out of my careful caution when I started using procmail. Since I didn't entirely trust procmail, I think that I set up my .forward to save a backup copy of all of my email in a file. At some point I then realized that disk space was cheap and didn't actually clear the file, just let it accumulate.)

Recently I realized that it needed one more thing to be really complete and useful; it needed to get a copy of my outgoing email, not just my incoming email. Thus over the past few years I've switched to cc'ing myself on everything I write (generally done automatically by my MUA, replacing saving the messages itself).

There are two important attributes of this brute force archive that make it so useful. First, it is truly comprehensive; it has everything, not just the things that I thought I was going to want (or need) later. I wouldn't say that I'm bad at picking what I'll need later, but I'm not completely accurate at it. Having a complete archive as a backup means that I don't have to be; my accuracy is more a matter of convenience than of necessity.

Second, it's separate from my regular mail environment so that my full archive doesn't clutter up (and slow down) my normal mail folders. This matters because how I want to use my regular folders is very different from how I use a comprehensive archive. If I tried to use only a comprehensive archive, I would immediately start losing important things in it; there would be so much volume and even so many false positives in searches that it would be pretty much useless. I need my regular folders to be curated and sorted (and, sometimes, pruned), to contain the things that I think matter and that I want to be paying attention to. This is nothing like a comprehensive archive.

In theory I could do all of this within a single mail environment. I would just have to be very disciplined about always saving a copy of every message (both received and sent) to my special 'archive' set of folders, no matter how trivial the message was, and then also having it in my regular set of folders as I processed it and perhaps saved it again.

In practice, having the system handle it all by just writing everything to a file is simpler and more reliable (and it's grunt work; computers exist to automatic grunt work). Also, since this file exists entirely outside of any MUA I may use, I know for sure that no MUA is touching it and doing things to the messages; they are archival perfect, exactly as originally received (and exactly in the order they were originally received).

PS: when I say everything I really do mean everything. So yes, this does mean that I get kind of irritated at people who email us ten megabyte files out of the blue (which happens every so often). But disk space really is cheap, and if I need to I can always bzip2 the archives.

Written on 07 February 2011.
« Dear Unix mailers: please allow more forgery
Thinking realistically about SQL database field sizes »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Feb 7 01:21:43 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.