How our IMAP server wound up running out of inodes

December 28, 2017

On Twitter, I mentioned that we'd run out of inodes on a server, and then a few weeks later I made a comment about an IMAP feature:

I'm coming to really dislike IMAP clients that don't use subscriptions, even though the consequences for our server are sort of our own fault.

These two tweets are very closely related, and there is a sad story here (since it's sort of our own fault).

In the IMAP protocol, there are two ways to get a list of mailboxes and folders that you have; the LIST command and the LSUB command. The difference between the two is that LSUB restricts itself to things that you have SUBSCRIBE'd to (another IMAP command), while the LIST command just lists, well, everything that the IMAP server can discover. When the IMAP server is backed by some sort of database, that 'what it can discover' comes from the database engine; when the IMAP server is storing things in the filesystem as a directory hierarchy, that just translates to a directory listing.

(For more details, see here and here.)

Many IMAP clients use IMAP subscriptions both to track what folders they know about and synchronize the list of known folders between clients, since your IMAP subscriptions are remembered by the server and stored there. However, some clients can't be bothered with this; they simply use LIST to ask the IMAP server for absolutely everything (and presumably then show some or all of it to you).

Even when your IMAP server is storing mailboxes and folders in the filesystem, the difference between LIST and LSUB is normally not particularly important because the IMAP server is normally using an area that's only for mailboxes, and the only thing normally found there is mailboxes. Then, unfortunately, there's us. Due to the ongoing requirements of backwards compatibility, the root of our IMAP server's mailbox storage is people's $HOME. It is quite possible for people's $HOME to contain a lot of things that aren't mailboxes and mail folders, at which point the difference between LIST and LSUB becomes very important to us. If a client uses IMAP subscriptions, what else is in $HOME doesn't matter; the client will only try to look through things you've subscribed to, which are presumably actually mailboxes (and limited). But if the client ignores IMAP subscriptions and just uses LIST, it winds up trying to look through everything, and then when it finds directories, it recurses down through them in turn.

A year and a half ago, our problem was runaway LIST searches that either ran into symlink cycles or escaped into the wider filesystem, hanging Dovecot and hammering our fileservers. That's basically stopped being a problem. Today's problem is that some people who use these clients have fairly large $HOMEs, with things like significant version-controlled source trees and datasets with lots of files and subdirectories. Dovecot maintains index files in a directory hierarchy for every mailbox and mail folder that it knows about; when a client uses LIST recursively, this translates to 'at least every directory that Dovecot runs across'. We have Dovecot store its indexes on the IMAP server's local mirrored system disks, because that's a lot faster than getting them over NFS.

This is how we wound up running out of inodes on our IMAP server. Dovecot was just trying to store too many index files and directories. Discarding people's index data didn't help for long, because of course their clients did it again and recreated it all after a few days.

(Our short term brute force solution was to put in a larger set of SSDs and create a partition just for Dovecot's index data, with the number of inodes set to the maximum value. This has managed to keep us out of danger so far.)

I suspect that clients doing this unrestricted LIST usage can't be giving the people using them a really good experience, but apparently it's not so terrible that people stop using them. Unfortunately we don't really have any ideas what specific clients are involved, partly because more and more people are using multiple clients across many different devices.

(Our long term fix is going to have to be migrating away from our backwards compatibility settings, but that's going to be a very slow process and probably a lot of work. Helpfully it can be done fairly easily for people who actually use IMAP subscriptions, but discussing the issues involved is for another entry.)

Sidebar: How many inodes we're talking about

At the moment, our most prolific user has over 1.3 million Dovecot index files and directories, with the next two most prolific users have over 730k and 600k respectively (fortunately it falls off fairly rapidly from there). The overall result of this is that our filesystem for storing this Dovecot index data has over 4.6 million inodes used.


Comments on this page:

You should consider setting mailbox_list_index=yes. This makes Dovecot use the dovecot.list.index files for LIST instead of doing a file system walk.

Have you considered storing Dovecot indexes on a file system that dynamically creates inodes? I've LONG used ReiserFS for news pools for this very reason. - I simply don't care about the number of files ~> inodes.

I have LONG been a ReiserFS (3) fan. ZFS has recently wooed me away for some of it's other features.

My first thought in this situation would be to have the server log SELECT commands, so that you can track what each client is actually looking at, completely independently of how it likes to handle its folders. Is this not a workable approach?

By cks at 2017-12-29 15:44:01:

Dovecot doesn't already have built in logging of server commands, so we'd have to modify it to add that and at the moment we're not willing to take on the various burdens of making this sort of changes to Dovecot. Our IMAP server is a critical component where a lot of people will notice if anything goes wrong and Dovecot is a complicated code base, which makes it non-trivial.

(I've also thought about putting a logging IMAP proxy in front of Dovecot, but on a casual look I couldn't spot a suitable one. There are obvious performance and security issues involved in such a thing, of course, in addition to the 'what if it explodes' worries.)

Written on 28 December 2017.
« When you have fileservers, they naturally become the center of the world
To get much faster, an implementation of Python must do less work »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Dec 28 02:40:26 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.