What problems the Maildir mail storage format solves

October 29, 2010

There is a relatively widespread view that the Maildir mail storage format is the solution to any issue that one is having with traditional mail message storage (cf the comment here). This is not exactly so, and as a result it is important to understand what problems Maildir actually solves and what ones it does nothing about. Or for that matter, the problems that Maildir can make worse.

(In this entry I'm going to compare Maildir against the traditional Unix mbox format.)

Maildir excels at all single-message updates, including deleting single messages and moving them into another folder. These can be done with relatively inexpensive filesystem operations or at the worst, rewriting a single message; by contrast, in mbox format a true delete or significant message modification generally requires copying the entire rest of the mailbox, and indexes can't help mbox at all.

(You can get fast 'delete' in mbox by just marking the message as deleted, but this doesn't reclaim any space and reclaiming space may be important to the user.)

If a message has known flags, Maildir gives you immediate access to it because you can open it by filename. If the message has unknown flags but you know its identifier, you need to scan the directory; this is likely to still be better than a linear read through the mbox file. Indexes can make mbox just as fast as Maildir, though.

Maildir is at best a modest help with any sort of sequential scan situation. If your mail client wants to see four headers from every message in a folder, you still have to open each message and read the first chunk of it. Whether reading the first chunk of a bunch of files is faster than a linear read through an mbox format file is an interesting question, and certainly Maildir isn't going to be faster than doing the equivalent with an indexed mbox file. Reading most of each message is all but certain to be worse than working on an mbox file, because using separate files defeats a lot of OS level readahead and the file data may well be more spread out across the disk than it would have been in a single file.

(If you want to accelerate this sort of scan, you need the relevant headers indexed and cached in some more conveniently digestible format. This is true of both mbox and Maildir.)

All of this assumes that you have either small directories (ie small folders) or non-linear directories and thus can assume that looking up a file in a directory is a very cheap operation. If you're on a Unix that still uses linear directories and you have large folders, simple filename lookup and manipulation can become quite an expensive operation itself (as can scanning the directory).


Comments on this page:

From 89.27.97.237 at 2010-10-29 04:01:09:

Dovecot using Maildir and (self-healing) indexes is pretty good, though.

From 78.35.25.18 at 2010-10-29 04:27:18:

The use of indices is orthogonal to the storage format; eg. mutt will maintain a header cache for Maildir folders if you ask it.

Maildir solves two issues:

  1. From-quoting – entirely obviated. Each message is a separate entity.
  2. Concurrent delivery – safe by design with no locks necessary. Filename manipulations are atomic.

Those are significant upsides. Other considerations may always negate them of course.

Aristotle Pagaltzis

By cks at 2010-10-31 13:20:49:

That's a good point; I hadn't thought of either issue, especially the From-quoting one. I personally don't think that concurrent delivery is a particular problem these days (even over NFS), but certainly Maildir has a much simpler and completely lock-free approach.

Written on 29 October 2010.
« An extra problem with not documenting things in open source modules
A very convenient trick: having a testing browser »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Oct 29 00:24:05 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.