Wandering Thoughts archives

2018-03-24

DovecotOurEmergencyHacks

Our current ugly hacks to Dovecot to help mitigate our IMAP problems

Back in the comments of this entry from the end of December, I said that we weren't willing to take on the various burdens of changing our local Dovecot to add some logging of things like the mailboxes that people's clients were accessing. In yesterday's entry I mentioned that we actually had hacked up our Dovecot to do exactly that. You might wonder what happened between December and now to cause us to change our minds. The short version is that from our perspective, things on our IMAP server got worse and so we became more willing to do things to mitigate our problems (especially since our migration plans were clearly not going to give us any short term improvements).

(It's not clear to me if the problems got worse in the past few months, which is certainly possible, or if we just noticed more and more about how bad things were once we started actively looking into the state of the server.)

We wound up making two changes to help mitigate our problem; our added logging is actually the second and less alarming one. Our first and most significant change was we hacked Dovecot so that LIST operations would ignore all names that started with . or were called exactly public_html, which is the name of the symlink that we drop into people's home directories to point to their web space. We made this change because monitoring runaway Dovecot processes that were rummaging through people's $HOME showed that many of them were traversing through subdirectory hierarchies that went through subdirectories like .git, .hg, $HOME/.local, $HOME/.cache, and so on. None of those have actual mailboxes but all of them are great places to find a lot of files, which is not a good thing in our environment. The public_html part of this had a similar motivation; we saw a significant number of Dovecot sessions that had staged great escapes into collections of data and other files that people had published in their home pages. Making this change didn't eliminate our problems but it clearly helped; we saw less load and less inode usage for Dovecot's indexes.

(While this sounds like a big change, it was a very small code modification. However, the scary part of making it was not being entirely sure that the effects of the change were only confined to IMAP LIST operations. Yes, we tested.)

Once we'd broken the ice with this change, it was much less of a deal to add some logging to capture information about what IMAP mailboxes people were using. We started out by logging for SELECT, but seeing our logging in action made it obvious that clients used a variety of IMAP commands and we needed to add logging to all of them to be confident that we were going to see all of the mailboxes they were using. To reduce the log volume, we skip logging SELECTs of INBOX; it turns out that clients do this all the time, and it's not interesting for our uses of the information.

(I had fun hunting through the IMAP RFC for commands that look mailbox names as one of their arguments, and I'm not sure I got them all. But I'm reasonably confident that we log almost all of them; we currently log for LIST, APPEND, MOVE, COPY, and RENAME. I didn't bother with CREATE, on the grounds that clients would probably do some other operation after CREATE'ing a mailbox if it mattered.)

Once we were adding logging, I decided to throw in logging of LIST arguments so we could understand when and how it was being used. This turned out to be very valuable, partly because I was starting from a position of relative ignorance about the IMAP protocol and how real IMAP clients behave. A fair bit of what I wrote about yesterday came from that logging, especially the realization that clients could scan through all of $HOME without leaving tell-tale signs in Dovecot's indexes, which meant that our problems were worse than we'd realized. Unfortunately the one current limitation of our LIST logging is that we can't log how many entries were returned by the LIST command. For obvious reasons, it would be very handy to be able to tell the difference between a LIST command that returned ten names and one that returned 5,000 of them.

I was quite pleasantly surprised to discover that the Dovecot source code is very nicely structured and organized, which made these changes much easier than they might otherwise have been. In particular, each IMAP command is in a separate source file, all with obvious names like 'cmd-list.c', and their main operation was pretty self contained and obvious. Logging was really easy to add and even the change to make LIST skip some names wasn't too difficult (partly because this part of the code was already skipping . and .., which gave me a starting point). As I noted yesterday, I hacked this directly into the main Dovecot source rather than trying to figure out the plugin API (which is undocumented as far as I can see). I believe that we could do all of the logging we're currently doing through the plugin API, and that's clearly the more generally correct approach to it.

Knowing what mailboxes people are using is a relatively important part of our current migration plans (which have completely changed from what I wrote up for various reasons), but that's going to be another entry.

sysadmin/DovecotOurEmergencyHacks written at 00:53:36; Add Comment

(Previous day | Next day)

By day for March 2018: 2 4 5 7 9 11 12 14 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31; before March; after March.