A learning experience about the performance of our IMAP server
Our IMAP server has never been entirely fast, and over the years it has slowly gotten slower and more loaded down. Why this was so seemed reasonably obvious to us; handling mail over IMAP required a fair amount of network bandwidth and a bunch of IO (often random IO) to our NFS fileservers, and there was only so much of that to go around. Things were getting slowly worse over time because more people were reading and storing more mail, while the hardware wasn't changing.
We have a long standing backwards compatibility with our IMAP
server, where people's IMAP clients have
full access to their
$HOME and would periodically go searching
through all of it. Recently this started causing us serious problems,
like running out of inodes on the IMAP server,
and it became clear that we needed to do something about it. After
a number of false starts (eg), we
wound up doing two important things over the past two months. First
we blocked Dovecot from searching through a lot of directories, and then we started manually migrating
users one by one to a setup where their IMAP
sessions could only see their
$HOME/IMAP instead of all of their
$HOME. The two changes together significantly reduce the number
of files and directories that Dovecot is scanning through (and
sometimes opening to count messages).
Well, guess what. Starting immediately with our first change and increasing as we migrated more and more high-impact users, the load on our IMAP server has been dropping dramatically. This is most clearly visible in the load average itself, where it's now entirely typical for the daytime load average to be under one (a level that was previously only achieved in the dead of night). The performance of my test Thunderbird setup has clearly improved, too, rising almost up to the level that I get on a completely unloaded test IMAP server. The change has basically been night and day; it's the most dramatic performance shift I can remember us managing (larger than finding our iSCSI problem in 2012). While the IMAP server's performance is not perfect and it can still bog down at some times, it's become clear that all of the extra scanning that Dovecot was doing was behind a great deal of the performance problems we were experiencing and that getting rid of it has had a major impact.
Technically, we weren't actually wrong about the causes of our IMAP server being slow; it definitely was due to network bandwidth and IO load issues. It's just that a great deal of that IO was completely unproductive and entirely avoidable, and if we had really investigated the situation we might have been able to improve the IMAP server long ago.
(And I think it got worse over time partly because more and more people started using clients, such as the iOS client, that seem to routinely use expensive scanning operations.)
The short and pungent version of what we learned is that IMAP servers go much faster if you don't let them do stupid things, like scan all through people's home directories. The corollary to this is that we shouldn't just assume that our servers aren't doing stupid things.
(You could say that another lesson is that if you know that your servers are occasionally doing stupid things, as we did, perhaps you should try to measure the impact of those things. But that's starting to smell a lot like hindsight bias.)