2018-03-24
Our current ugly hacks to Dovecot to help mitigate our IMAP problems
Back in the comments of this entry from the end of December, I said that we weren't willing to take on the various burdens of changing our local Dovecot to add some logging of things like the mailboxes that people's clients were accessing. In yesterday's entry I mentioned that we actually had hacked up our Dovecot to do exactly that. You might wonder what happened between December and now to cause us to change our minds. The short version is that from our perspective, things on our IMAP server got worse and so we became more willing to do things to mitigate our problems (especially since our migration plans were clearly not going to give us any short term improvements).
(It's not clear to me if the problems got worse in the past few months, which is certainly possible, or if we just noticed more and more about how bad things were once we started actively looking into the state of the server.)
We wound up making two changes to help mitigate our problem; our
added logging is actually the second and less alarming one. Our
first and most significant change was we hacked Dovecot so that
LIST
operations would ignore all names that started with .
or
were called exactly public_html
, which is the name of the symlink
that we drop into people's home directories to point to their web
space. We made this change because monitoring runaway Dovecot
processes that were rummaging through people's $HOME
showed that
many of them were traversing through subdirectory hierarchies that
went through subdirectories like .git
, .hg
, $HOME/.local
,
$HOME/.cache
, and so on. None of those have actual mailboxes but
all of them are great places to find a lot of files, which is not
a good thing in our environment. The public_html
part of this
had a similar motivation; we saw a significant number of Dovecot
sessions that had staged great escapes into collections of data and
other files that people had published in their home pages. Making
this change didn't eliminate our problems but it clearly helped;
we saw less load and less inode usage for Dovecot's indexes.
(While this sounds like a big change, it was a very small code modification. However, the scary part of making it was not being entirely sure that the effects of the change were only confined to IMAP LIST operations. Yes, we tested.)
Once we'd broken the ice with this change, it was much less of a
deal to add some logging to capture information about what IMAP
mailboxes people were using. We started out by logging for SELECT
,
but seeing our logging in action made it obvious that clients used
a variety of IMAP commands and we needed to add logging to all of
them to be confident that we were going to see all of the mailboxes
they were using. To reduce the log volume, we skip logging SELECTs
of INBOX; it turns out that clients do this all the time, and it's
not interesting for our uses of the information.
(I had fun hunting through the IMAP RFC for commands that look mailbox names as one of their arguments, and I'm not sure I got them all. But I'm reasonably confident that we log almost all of them; we currently log for LIST, APPEND, MOVE, COPY, and RENAME. I didn't bother with CREATE, on the grounds that clients would probably do some other operation after CREATE'ing a mailbox if it mattered.)
Once we were adding logging, I decided to throw in logging of LIST
arguments so we could understand when and how it was being used.
This turned out to be very valuable, partly because I was starting
from a position of relative ignorance about the IMAP protocol and
how real IMAP clients behave. A fair bit of what I wrote about
yesterday came from that logging, especially
the realization that clients could scan through all of $HOME
without leaving tell-tale signs in Dovecot's indexes, which meant
that our problems were worse than we'd realized. Unfortunately the
one current limitation of our LIST logging is that we can't log how
many entries were returned by the LIST command. For obvious reasons,
it would be very handy to be able to tell the difference between a
LIST command that returned ten names and one that returned 5,000
of them.
I was quite pleasantly surprised to discover that the Dovecot source
code is very nicely structured and organized, which made these
changes much easier than they might otherwise have been. In particular,
each IMAP command is in a separate source file, all with obvious
names like 'cmd-list.c
', and their main operation was pretty self
contained and obvious. Logging was really easy to add and even the
change to make LIST skip some names wasn't too difficult (partly
because this part of the code was already skipping .
and ..
,
which gave me a starting point). As I noted yesterday, I hacked
this directly into the main Dovecot source rather than trying to
figure out the plugin API (which is undocumented as far as I can
see). I believe that we could do all of the logging we're currently
doing through the plugin API, and that's clearly the more generally
correct approach to it.
Knowing what mailboxes people are using is a relatively important part of our current migration plans (which have completely changed from what I wrote up for various reasons), but that's going to be another entry.