Using a local database to get consistent device names is a bad idea
People like consistent device names, and one of the ways that Unixes have historically tried to get them is to keep a local database of known devices and their names, based on some sort of fingerprint of the device (the MAC address is a popular fingerprint for Ethernet interfaces, for example). Over the years various Unixes have implemented this in different ways; for example, some versions of Linux auto-created udev rules for some devices, and Solaris and derivatives have /etc/path_to_inst. Unfortunately, I have to tell you that trying to get consistent device names this way turns out to be a bad idea.
The fundamental problem is that if you keep a database of local device names, your device names depend on the history of the system. This has two immediate bad results. First, if you have two systems with identical hardware running identical software they won't necessarily use the same device names, because one system could have previously had a different hardware configuration. Second, if you reinstall an existing system from scratch you won't necessarily wind up with the same device names, because your new install won't necessarily have the same history as the current system does.
(Depending on the scheme, you may also have the additional bad result that moving system disks from one machine to an identical second machine will change the device names because things like MAC addresses changed.)
Both of these problems are bad once you start dealing with multiple systems. They make your systems inconsistent, which increases the work required to manage them, and they make it potentially dangerous to reinstall systems. You wind up either having to memorize the differences from system to system or needing to assemble your own layer of indirection on top of the system's device names so you can specify things like 'the primary network interface, no matter what this system calls it'.
Now, you can have this machine to machine variation problems even with schemes that derive names from the hardware configuration. But with such schemes, at least you only have these problems on hardware that's different, not on hardware that's identical. If you have truly identical hardware, you know that the device names are identical. By extension you know that the device names will be identical after a reinstall (because the hardware is the same before and after).
I do understand the urge to have device names that stay consistent even if you change the hardware around a bit, and I sometimes quite like them myself. But I've come to think that such names should be added as an extra optional layer on top of a system that creates device names that are 'stateless' (ie don't care about the past history of the system). It's also best if these device aliases can be based on general properties (or set up by hand in configuration files), because often what I really want is an abstraction like 'the network interface that's on network X' or 'the device of the root filesystem'.
Our revised Dovecot IMAP configuration migration plans (and processes)
Back at the start of January, I wrote up the goals and problems
of our Dovecot IMAP migration, and in
an appendix at the end I outlined what became our initial migration
plans. We would build an entirely new Dovecot server that was set
up with people's IMAP mail folder storage being a subdirectory of
$HOME/mail (call this the IMAP root), and
then we would get people to move to this server one by one. Migration
would require them to change their clients and might require them
(or us) to move files in Unix. Eventually we would tell the remaining
holdouts that we were just going to turn off the old IMAP server
and they had to migrate now.
Initially, the great virtue I saw in this plan was that it was entirely user driven and didn't require us to do anything. The users did everything, could go at their own speed, and were completely responsible for what happened. In an environment where we couldn't count on clients using IMAP subscriptions so we could know what people's mailboxes actually were, things had to be user-driven anyway, and we generally try to stay out of doing per-user things because it doesn't scale; we have a lot of users and not very many people looking after our central systems (including the IMAP server).
As we talked more and more about this, we realized that the central problem with this plan is that everyone had to migrate and this involved the users doing things (often at the Unix level), or getting someone to help them. As mentioned, we have a lot of users, and some of them are quite important (eg, professors) and can't just be abandoned to their fate. There was no way to make this not be disruptive to people. At the same time, most of our users were not causing any problems, which meant that we'd be forcing a lot of people to do disruptive things (on all of their devices, better not miss one) to deal with a problem created by a much smaller number of users.
If this was the only way to deal with things, we might still have
gone ahead with it. But as I sort of alluded to in passing in the
January entry, it's possible to do
this on a per-user basis in Dovecot using a shell script (see the
bottom of MailLocation).
After we talked it over, we decided that this was the way we wanted
to handle the migration to people's IMAP sessions being confined
to a subdirectory of their
$HOME; it would be done on a per-user
basis and we'd directly target high-priority problem cases. The
vast majority of our current users would forever stay un-migrated,
while new users would be set up to be confined to a
subdirectory from the start (ie, using the new IMAP root).
As much as possible, we wanted this migration to be transparent to
users (or at least important ones). That meant that the IMAP mailbox
names as seen by the clients couldn't change, and that meant that
no matter what we were going to have to move files around; there's
no other way for this to be transparent to clients when you change
the IMAP root. Given that, it wasn't important to pick a new IMAP
root that people already used for mailboxes, so we picked
for various reasons (including that calling it this made it clear
what it was for).
Since this plan means that we're moving user mailboxes around at
least some of the time (in order to migrate problem users), knowing
what those mailboxes were became important enough to get us to
hack some mailbox logging into Dovecot.
Having this information has been extremely reassuring. Even when
it just duplicates the information in a user's
file, it also confirms that that information is accurate and complete.
We started out with plans for a two-stage operation for most users,
where we'd first tell them to move all of their IMAP mailboxes under
mail/' in their client (ie,
$HOME/mail) before some deadline,
then at the deadline we'd make
and flip the server setting that made
$HOME/IMAP their IMAP root.
In practice it's turned out to be easier to do the file moving
ourselves, based on both
.subscriptions and the logs, so our
current approach is to just tell various users 'unless you object,
at time X we'll be improving your IMAP client experience by ...'
and then at time X we do everything ourselves. It's been a little
bit surprising how few actual active mailboxes some of these users
have, especially relative to how much of an impact they've been
having on the server.
(This genuinely does improve the IMAP client experience for people,
for obvious reasons. An IMAP client that is scanning all of your
$HOME and maybe opening all the files there
is generally not a responsive client, not if your
$HOME is at all
PS: Although I haven't been writing about it here on Wandering Thoughts until recently, our IMAP situation has been consuming a lot of my attention and time at work. It's turned into a real learning experience in several ways.
Our current ugly hacks to Dovecot to help mitigate our IMAP problems
Back in the comments of this entry from the end of December, I said that we weren't willing to take on the various burdens of changing our local Dovecot to add some logging of things like the mailboxes that people's clients were accessing. In yesterday's entry I mentioned that we actually had hacked up our Dovecot to do exactly that. You might wonder what happened between December and now to cause us to change our minds. The short version is that from our perspective, things on our IMAP server got worse and so we became more willing to do things to mitigate our problems (especially since our migration plans were clearly not going to give us any short term improvements).
(It's not clear to me if the problems got worse in the past few months, which is certainly possible, or if we just noticed more and more about how bad things were once we started actively looking into the state of the server.)
We wound up making two changes to help mitigate our problem; our
added logging is actually the second and less alarming one. Our
first and most significant change was we hacked Dovecot so that
LIST operations would ignore all names that started with
were called exactly
public_html, which is the name of the symlink
that we drop into people's home directories to point to their web
space. We made this change because monitoring runaway Dovecot
processes that were rummaging through people's
$HOME showed that
many of them were traversing through subdirectory hierarchies that
went through subdirectories like
$HOME/.cache, and so on. None of those have actual mailboxes but
all of them are great places to find a lot of files, which is not
a good thing in our environment. The
public_html part of this
had a similar motivation; we saw a significant number of Dovecot
sessions that had staged great escapes into collections of data and
other files that people had published in their home pages. Making
this change didn't eliminate our problems but it clearly helped;
we saw less load and less inode usage for Dovecot's indexes.
(While this sounds like a big change, it was a very small code modification. However, the scary part of making it was not being entirely sure that the effects of the change were only confined to IMAP LIST operations. Yes, we tested.)
Once we'd broken the ice with this change, it was much less of a
deal to add some logging to capture information about what IMAP
mailboxes people were using. We started out by logging for
but seeing our logging in action made it obvious that clients used
a variety of IMAP commands and we needed to add logging to all of
them to be confident that we were going to see all of the mailboxes
they were using. To reduce the log volume, we skip logging SELECTs
of INBOX; it turns out that clients do this all the time, and it's
not interesting for our uses of the information.
(I had fun hunting through the IMAP RFC for commands that look mailbox names as one of their arguments, and I'm not sure I got them all. But I'm reasonably confident that we log almost all of them; we currently log for LIST, APPEND, MOVE, COPY, and RENAME. I didn't bother with CREATE, on the grounds that clients would probably do some other operation after CREATE'ing a mailbox if it mattered.)
Once we were adding logging, I decided to throw in logging of LIST
arguments so we could understand when and how it was being used.
This turned out to be very valuable, partly because I was starting
from a position of relative ignorance about the IMAP protocol and
how real IMAP clients behave. A fair bit of what I wrote about
yesterday came from that logging, especially
the realization that clients could scan through all of
without leaving tell-tale signs in Dovecot's indexes, which meant
that our problems were worse than we'd realized. Unfortunately the
one current limitation of our LIST logging is that we can't log how
many entries were returned by the LIST command. For obvious reasons,
it would be very handy to be able to tell the difference between a
LIST command that returned ten names and one that returned 5,000
I was quite pleasantly surprised to discover that the Dovecot source
code is very nicely structured and organized, which made these
changes much easier than they might otherwise have been. In particular,
each IMAP command is in a separate source file, all with obvious
names like '
cmd-list.c', and their main operation was pretty self
contained and obvious. Logging was really easy to add and even the
change to make LIST skip some names wasn't too difficult (partly
because this part of the code was already skipping
which gave me a starting point). As I noted yesterday, I hacked
this directly into the main Dovecot source rather than trying to
figure out the plugin API (which is undocumented as far as I can
see). I believe that we could do all of the logging we're currently
doing through the plugin API, and that's clearly the more generally
correct approach to it.
Knowing what mailboxes people are using is a relatively important part of our current migration plans (which have completely changed from what I wrote up for various reasons), but that's going to be another entry.
Some things about Dovecot, its index files, and the IMAP LIST command
We have a backwards compatibility issue
with our IMAP server, where people's IMAP roots are
$HOME, their home directory,
and then clients ask the IMAP server to search all through the IMAP
namespace; this causes various bad things to happen, including
running out of inodes. The reason we ran
out of inodes is that Dovecot maintains some index files for every mailbox it looks
We have Dovecot store its index files on our IMAP server's local
/var/local/dovecot/<user>. Dovecot puts these in a
hierarchy that mirrors the actual Unix (and IMAP) hierarchy of the
mailboxes; if there is a subdirectory
Drafts, the Dovecot index files will be in
.../<user>/Mail/.imap/Drafts/. It follows that you can hunt
through someone's Dovecot index files to see what mailboxes their
clients have looked at, although this may tell you less than you
think and what their active mailboxes are.
(One reason that Dovecot might look at a mailbox is that your client
has explicitly asked it to, with an IMAP
SELECT command or perhaps
MOVE operation. However, there are other
When I began digging into our IMAP pain and working on our planned migration (which has drastically changed directions since then), I was operating under the charming idea that most clients used IMAP subscriptions and only a few of them asked the IMAP server to inventory everything in sight. One of the reasons for this is that only a few people had huge numbers of Dovecot index files, and I assumed that the two were tied together. It turns out that both sides of this are wrong.
Perhaps I had the idea that it was hard to do an IMAP
operation that asked the server to recursively descend through
everything under your IMAP root. It isn't; it's trivial. Here's
the IMAP command to do it:
m LIST "" "*"
That's all it takes (the unrestricted * is the important bit). The sort of good news is that this operation by itself won't cause Dovecot to actually look at those mailboxes and thus to build index files for them. However, there is a close variant of this LIST command that does force Dovecot to look at each file, because it turns out that you can ask your IMAP server to not just list all your mailboxes but to tell you which ones have unseen messages. That looks like this:
m LIST "" "*" RETURN (SPECIAL-USE STATUS (UNSEEN))
Some clients use one LIST version, some use the other, and some seem to use both. Importantly, the standard iOS Mail app appears to use the 'LIST UNSEEN' version at least some of the time. iDevices are popular around the department, and it's not all that easy to find the magic setting for what iOS calls the 'IMAP path prefix'.
For us, a user with a lot of Dovecot index files was definitely
someone who had a client with the 'search all through
problem (especially if the indexes were for things that just aren't
plausible mailboxes). However, a user with only a few index files
wasn't necessarily someone without the problem, because their client
could be using the first version of the
LIST command and thus not
creating all those tell-tale index files. As far as I know, stock
Dovecot has no way of letting you find out about these people.
(We hacked logging in to the Ubuntu version of Dovecot, which involved some annoyances. In theory Dovecot has a plugin system that we might have been able to use for this; in practice, figuring out the plugin API seemed likely to be at least as much work as hacking the Dovecot source directly.)
Sidebar: Limited LISTs
IMAP LIST commands can be limited in two ways, both of which have more or less the same effect for us:
m LIST "" "mail/*" m LIST "mail/" "*"
For information on what the arguments to the basic LIST command mean, I will refer you to the IMAP RFC. The extended form is discussed in RFC 5819 and is based on things from, I believe, RFC 5258. See also RFC 6154 and here for the special-use stuff.
Why Let's Encrypt's short certificate lifetimes are a great thing
I recently had a conversation on Twitter about what we care about in TLS certificate sources, and it got me to realize something. I've written before about how our attraction to Let's Encrypt has become all about the great automation, but what I hadn't really thought about back then was how important the short certificate lifetimes are. What got me to really thinking about it was a hypothetical; suppose we could get completely automatically issued and renewed free certificates but they had the typical one or more year lifetime of most TLS certificates to date. Would we be interested? I realized that we would not be, and that we would probably consider the long certificate lifetime to be a drawback, not a feature.
There is a general saying in modern programming to the effect that if you haven't tested it, it doesn't work. In system administration, we tend towards a modified version of that saying; if you haven't tested it recently, it doesn't work. Given our generally changing system environments, the recently is an important qualification; it's too easy for things to get broken by changes around them, so the longer it's been since you tried something, the less confidence you can have in it. The corollary for infrequent certificate renewal is obvious, because even in automated systems things can happen.
With Let's Encrypt, we don't just have automation; the short certificate lifetime insures that we exercise it frequently. Our client of choice (acmetool) renews certificates when they're 30 days from expiring, so although the official Let's Encrypt lifetime is 90 days, we roll over certificates every sixty days. Having a rollover happen once every two months is great for building and maintaining our confidence in the automation, in a way that wouldn't happen if it was once every six months, once a year, or even less often. If it was that infrequent, we'd probably end up paying attention during certificate rollovers even if we let automation do all of the actual work. With the frequent rollover due to Let's Encrypt's short certificate lifetimes, they've become things we trust enough to ignore.
(Automatic certificate renewal for long duration certificates is not completely impossible here, because the university central IT has already arranged for free certificates for the university. Right now they're managed through a website and our university-wide authentication system, but in theory there could be automation for at least renewals. Our one remaining non Let's Encrypt certificate was issued through this service as a two year certificate.)