A learning experience about the performance of our IMAP server

April 13, 2018

Our IMAP server has never been entirely fast, and over the years it has slowly gotten slower and more loaded down. Why this was so seemed reasonably obvious to us; handling mail over IMAP required a fair amount of network bandwidth and a bunch of IO (often random IO) to our NFS fileservers, and there was only so much of that to go around. Things were getting slowly worse over time because more people were reading and storing more mail, while the hardware wasn't changing.

We have a long standing backwards compatibility with our IMAP server, where people's IMAP clients have full access to their $HOME and would periodically go searching through all of it. Recently this started causing us serious problems, like running out of inodes on the IMAP server, and it became clear that we needed to do something about it. After a number of false starts (eg), we wound up doing two important things over the past two months. First we blocked Dovecot from searching through a lot of directories, and then we started manually migrating users one by one to a setup where their IMAP sessions could only see their $HOME/IMAP instead of all of their $HOME. The two changes together significantly reduce the number of files and directories that Dovecot is scanning through (and sometimes opening to count messages).

Well, guess what. Starting immediately with our first change and increasing as we migrated more and more high-impact users, the load on our IMAP server has been dropping dramatically. This is most clearly visible in the load average itself, where it's now entirely typical for the daytime load average to be under one (a level that was previously only achieved in the dead of night). The performance of my test Thunderbird setup has clearly improved, too, rising almost up to the level that I get on a completely unloaded test IMAP server. The change has basically been night and day; it's the most dramatic performance shift I can remember us managing (larger than finding our iSCSI problem in 2012). While the IMAP server's performance is not perfect and it can still bog down at some times, it's become clear that all of the extra scanning that Dovecot was doing was behind a great deal of the performance problems we were experiencing and that getting rid of it has had a major impact.

Technically, we weren't actually wrong about the causes of our IMAP server being slow; it definitely was due to network bandwidth and IO load issues. It's just that a great deal of that IO was completely unproductive and entirely avoidable, and if we had really investigated the situation we might have been able to improve the IMAP server long ago.

(And I think it got worse over time partly because more and more people started using clients, such as the iOS client, that seem to routinely use expensive scanning operations.)

The short and pungent version of what we learned is that IMAP servers go much faster if you don't let them do stupid things, like scan all through people's home directories. The corollary to this is that we shouldn't just assume that our servers aren't doing stupid things.

(You could say that another lesson is that if you know that your servers are occasionally doing stupid things, as we did, perhaps you should try to measure the impact of those things. But that's starting to smell a lot like hindsight bias.)

Written on 13 April 2018.
« I'm hoping that RHEL 8's decision on Python 2 isn't Ubuntu 20.04's decision
For the first time, my home PC has no expansion cards »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Apr 13 02:06:21 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.