2010-10-29
What problems the Maildir mail storage format solves
There is a relatively widespread view that the Maildir mail storage format is the solution to any issue that one is having with traditional mail message storage (cf the comment here). This is not exactly so, and as a result it is important to understand what problems Maildir actually solves and what ones it does nothing about. Or for that matter, the problems that Maildir can make worse.
(In this entry I'm going to compare Maildir against the traditional Unix mbox format.)
Maildir excels at all single-message updates, including deleting single messages and moving them into another folder. These can be done with relatively inexpensive filesystem operations or at the worst, rewriting a single message; by contrast, in mbox format a true delete or significant message modification generally requires copying the entire rest of the mailbox, and indexes can't help mbox at all.
(You can get fast 'delete' in mbox by just marking the message as deleted, but this doesn't reclaim any space and reclaiming space may be important to the user.)
If a message has known flags, Maildir gives you immediate access to it because you can open it by filename. If the message has unknown flags but you know its identifier, you need to scan the directory; this is likely to still be better than a linear read through the mbox file. Indexes can make mbox just as fast as Maildir, though.
Maildir is at best a modest help with any sort of sequential scan situation. If your mail client wants to see four headers from every message in a folder, you still have to open each message and read the first chunk of it. Whether reading the first chunk of a bunch of files is faster than a linear read through an mbox format file is an interesting question, and certainly Maildir isn't going to be faster than doing the equivalent with an indexed mbox file. Reading most of each message is all but certain to be worse than working on an mbox file, because using separate files defeats a lot of OS level readahead and the file data may well be more spread out across the disk than it would have been in a single file.
(If you want to accelerate this sort of scan, you need the relevant headers indexed and cached in some more conveniently digestible format. This is true of both mbox and Maildir.)
All of this assumes that you have either small directories (ie small folders) or non-linear directories and thus can assume that looking up a file in a directory is a very cheap operation. If you're on a Unix that still uses linear directories and you have large folders, simple filename lookup and manipulation can become quite an expensive operation itself (as can scanning the directory).