Wandering Thoughts archives

2018-03-31

Using a local database to get consistent device names is a bad idea

People like consistent device names, and one of the ways that Unixes have historically tried to get them is to keep a local database of known devices and their names, based on some sort of fingerprint of the device (the MAC address is a popular fingerprint for Ethernet interfaces, for example). Over the years various Unixes have implemented this in different ways; for example, some versions of Linux auto-created udev rules for some devices, and Solaris and derivatives have /etc/path_to_inst. Unfortunately, I have to tell you that trying to get consistent device names this way turns out to be a bad idea.

The fundamental problem is that if you keep a database of local device names, your device names depend on the history of the system. This has two immediate bad results. First, if you have two systems with identical hardware running identical software they won't necessarily use the same device names, because one system could have previously had a different hardware configuration. Second, if you reinstall an existing system from scratch you won't necessarily wind up with the same device names, because your new install won't necessarily have the same history as the current system does.

(Depending on the scheme, you may also have the additional bad result that moving system disks from one machine to an identical second machine will change the device names because things like MAC addresses changed.)

Both of these problems are bad once you start dealing with multiple systems. They make your systems inconsistent, which increases the work required to manage them, and they make it potentially dangerous to reinstall systems. You wind up either having to memorize the differences from system to system or needing to assemble your own layer of indirection on top of the system's device names so you can specify things like 'the primary network interface, no matter what this system calls it'.

Now, you can have this machine to machine variation problems even with schemes that derive names from the hardware configuration. But with such schemes, at least you only have these problems on hardware that's different, not on hardware that's identical. If you have truly identical hardware, you know that the device names are identical. By extension you know that the device names will be identical after a reinstall (because the hardware is the same before and after).

I do understand the urge to have device names that stay consistent even if you change the hardware around a bit, and I sometimes quite like them myself. But I've come to think that such names should be added as an extra optional layer on top of a system that creates device names that are 'stateless' (ie don't care about the past history of the system). It's also best if these device aliases can be based on general properties (or set up by hand in configuration files), because often what I really want is an abstraction like 'the network interface that's on network X' or 'the device of the root filesystem'.

NoConsistentNamesDB written at 20:18:02; Add Comment

2018-03-25

Our revised Dovecot IMAP configuration migration plans (and processes)

Back at the start of January, I wrote up the goals and problems of our Dovecot IMAP migration, and in an appendix at the end I outlined what became our initial migration plans. We would build an entirely new Dovecot server that was set up with people's IMAP mail folder storage being a subdirectory of their $HOME, say $HOME/mail (call this the IMAP root), and then we would get people to move to this server one by one. Migration would require them to change their clients and might require them (or us) to move files in Unix. Eventually we would tell the remaining holdouts that we were just going to turn off the old IMAP server and they had to migrate now.

Initially, the great virtue I saw in this plan was that it was entirely user driven and didn't require us to do anything. The users did everything, could go at their own speed, and were completely responsible for what happened. In an environment where we couldn't count on clients using IMAP subscriptions so we could know what people's mailboxes actually were, things had to be user-driven anyway, and we generally try to stay out of doing per-user things because it doesn't scale; we have a lot of users and not very many people looking after our central systems (including the IMAP server).

As we talked more and more about this, we realized that the central problem with this plan is that everyone had to migrate and this involved the users doing things (often at the Unix level), or getting someone to help them. As mentioned, we have a lot of users, and some of them are quite important (eg, professors) and can't just be abandoned to their fate. There was no way to make this not be disruptive to people. At the same time, most of our users were not causing any problems, which meant that we'd be forcing a lot of people to do disruptive things (on all of their devices, better not miss one) to deal with a problem created by a much smaller number of users.

If this was the only way to deal with things, we might still have gone ahead with it. But as I sort of alluded to in passing in the January entry, it's possible to do this on a per-user basis in Dovecot using a shell script (see the bottom of MailLocation). After we talked it over, we decided that this was the way we wanted to handle the migration to people's IMAP sessions being confined to a subdirectory of their $HOME; it would be done on a per-user basis and we'd directly target high-priority problem cases. The vast majority of our current users would forever stay un-migrated, while new users would be set up to be confined to a $HOME subdirectory from the start (ie, using the new IMAP root).

As much as possible, we wanted this migration to be transparent to users (or at least important ones). That meant that the IMAP mailbox names as seen by the clients couldn't change, and that meant that no matter what we were going to have to move files around; there's no other way for this to be transparent to clients when you change the IMAP root. Given that, it wasn't important to pick a new IMAP root that people already used for mailboxes, so we picked $HOME/IMAP for various reasons (including that calling it this made it clear what it was for).

Since this plan means that we're moving user mailboxes around at least some of the time (in order to migrate problem users), knowing what those mailboxes were became important enough to get us to hack some mailbox logging into Dovecot. Having this information has been extremely reassuring. Even when it just duplicates the information in a user's .subscriptions file, it also confirms that that information is accurate and complete.

We started out with plans for a two-stage operation for most users, where we'd first tell them to move all of their IMAP mailboxes under 'mail/' in their client (ie, $HOME/mail) before some deadline, then at the deadline we'd make $HOME/mail into $HOME/IMAP/mail and flip the server setting that made $HOME/IMAP their IMAP root. In practice it's turned out to be easier to do the file moving ourselves, based on both .subscriptions and the logs, so our current approach is to just tell various users 'unless you object, at time X we'll be improving your IMAP client experience by ...' and then at time X we do everything ourselves. It's been a little bit surprising how few actual active mailboxes some of these users have, especially relative to how much of an impact they've been having on the server.

(This genuinely does improve the IMAP client experience for people, for obvious reasons. An IMAP client that is scanning all of your $HOME and maybe opening all the files there is generally not a responsive client, not if your $HOME is at all big.)

PS: Although I haven't been writing about it here on Wandering Thoughts until recently, our IMAP situation has been consuming a lot of my attention and time at work. It's turned into a real learning experience in several ways.

IMAPMigrationRevised written at 03:02:48; Add Comment

2018-03-24

Our current ugly hacks to Dovecot to help mitigate our IMAP problems

Back in the comments of this entry from the end of December, I said that we weren't willing to take on the various burdens of changing our local Dovecot to add some logging of things like the mailboxes that people's clients were accessing. In yesterday's entry I mentioned that we actually had hacked up our Dovecot to do exactly that. You might wonder what happened between December and now to cause us to change our minds. The short version is that from our perspective, things on our IMAP server got worse and so we became more willing to do things to mitigate our problems (especially since our migration plans were clearly not going to give us any short term improvements).

(It's not clear to me if the problems got worse in the past few months, which is certainly possible, or if we just noticed more and more about how bad things were once we started actively looking into the state of the server.)

We wound up making two changes to help mitigate our problem; our added logging is actually the second and less alarming one. Our first and most significant change was we hacked Dovecot so that LIST operations would ignore all names that started with . or were called exactly public_html, which is the name of the symlink that we drop into people's home directories to point to their web space. We made this change because monitoring runaway Dovecot processes that were rummaging through people's $HOME showed that many of them were traversing through subdirectory hierarchies that went through subdirectories like .git, .hg, $HOME/.local, $HOME/.cache, and so on. None of those have actual mailboxes but all of them are great places to find a lot of files, which is not a good thing in our environment. The public_html part of this had a similar motivation; we saw a significant number of Dovecot sessions that had staged great escapes into collections of data and other files that people had published in their home pages. Making this change didn't eliminate our problems but it clearly helped; we saw less load and less inode usage for Dovecot's indexes.

(While this sounds like a big change, it was a very small code modification. However, the scary part of making it was not being entirely sure that the effects of the change were only confined to IMAP LIST operations. Yes, we tested.)

Once we'd broken the ice with this change, it was much less of a deal to add some logging to capture information about what IMAP mailboxes people were using. We started out by logging for SELECT, but seeing our logging in action made it obvious that clients used a variety of IMAP commands and we needed to add logging to all of them to be confident that we were going to see all of the mailboxes they were using. To reduce the log volume, we skip logging SELECTs of INBOX; it turns out that clients do this all the time, and it's not interesting for our uses of the information.

(I had fun hunting through the IMAP RFC for commands that look mailbox names as one of their arguments, and I'm not sure I got them all. But I'm reasonably confident that we log almost all of them; we currently log for LIST, APPEND, MOVE, COPY, and RENAME. I didn't bother with CREATE, on the grounds that clients would probably do some other operation after CREATE'ing a mailbox if it mattered.)

Once we were adding logging, I decided to throw in logging of LIST arguments so we could understand when and how it was being used. This turned out to be very valuable, partly because I was starting from a position of relative ignorance about the IMAP protocol and how real IMAP clients behave. A fair bit of what I wrote about yesterday came from that logging, especially the realization that clients could scan through all of $HOME without leaving tell-tale signs in Dovecot's indexes, which meant that our problems were worse than we'd realized. Unfortunately the one current limitation of our LIST logging is that we can't log how many entries were returned by the LIST command. For obvious reasons, it would be very handy to be able to tell the difference between a LIST command that returned ten names and one that returned 5,000 of them.

I was quite pleasantly surprised to discover that the Dovecot source code is very nicely structured and organized, which made these changes much easier than they might otherwise have been. In particular, each IMAP command is in a separate source file, all with obvious names like 'cmd-list.c', and their main operation was pretty self contained and obvious. Logging was really easy to add and even the change to make LIST skip some names wasn't too difficult (partly because this part of the code was already skipping . and .., which gave me a starting point). As I noted yesterday, I hacked this directly into the main Dovecot source rather than trying to figure out the plugin API (which is undocumented as far as I can see). I believe that we could do all of the logging we're currently doing through the plugin API, and that's clearly the more generally correct approach to it.

Knowing what mailboxes people are using is a relatively important part of our current migration plans (which have completely changed from what I wrote up for various reasons), but that's going to be another entry.

DovecotOurEmergencyHacks written at 00:53:36; Add Comment

2018-03-23

Some things about Dovecot, its index files, and the IMAP LIST command

We have a backwards compatibility issue with our IMAP server, where people's IMAP roots are $HOME, their home directory, and then clients ask the IMAP server to search all through the IMAP namespace; this causes various bad things to happen, including running out of inodes. The reason we ran out of inodes is that Dovecot maintains some index files for every mailbox it looks at.

We have Dovecot store its index files on our IMAP server's local disk, in /var/local/dovecot/<user>. Dovecot puts these in a hierarchy that mirrors the actual Unix (and IMAP) hierarchy of the mailboxes; if there is a subdirectory Mail in your home directory with a mailbox Drafts, the Dovecot index files will be in .../<user>/Mail/.imap/Drafts/. It follows that you can hunt through someone's Dovecot index files to see what mailboxes their clients have looked at, although this may tell you less than you think and what their active mailboxes are.

(One reason that Dovecot might look at a mailbox is that your client has explicitly asked it to, with an IMAP SELECT command or perhaps an APPEND, COPY, or MOVE operation. However, there are other reasons.)

When I began digging into our IMAP pain and working on our planned migration (which has drastically changed directions since then), I was operating under the charming idea that most clients used IMAP subscriptions and only a few of them asked the IMAP server to inventory everything in sight. One of the reasons for this is that only a few people had huge numbers of Dovecot index files, and I assumed that the two were tied together. It turns out that both sides of this are wrong.

Perhaps I had the idea that it was hard to do an IMAP LIST operation that asked the server to recursively descend through everything under your IMAP root. It isn't; it's trivial. Here's the IMAP command to do it:

m LIST "" "*"

That's all it takes (the unrestricted * is the important bit). The sort of good news is that this operation by itself won't cause Dovecot to actually look at those mailboxes and thus to build index files for them. However, there is a close variant of this LIST command that does force Dovecot to look at each file, because it turns out that you can ask your IMAP server to not just list all your mailboxes but to tell you which ones have unseen messages. That looks like this:

m LIST "" "*" RETURN (SPECIAL-USE STATUS (UNSEEN))

Some clients use one LIST version, some use the other, and some seem to use both. Importantly, the standard iOS Mail app appears to use the 'LIST UNSEEN' version at least some of the time. iDevices are popular around the department, and it's not all that easy to find the magic setting for what iOS calls the 'IMAP path prefix'.

For us, a user with a lot of Dovecot index files was definitely someone who had a client with the 'search all through $HOME' problem (especially if the indexes were for things that just aren't plausible mailboxes). However, a user with only a few index files wasn't necessarily someone without the problem, because their client could be using the first version of the LIST command and thus not creating all those tell-tale index files. As far as I know, stock Dovecot has no way of letting you find out about these people.

(We hacked logging in to the Ubuntu version of Dovecot, which involved some annoyances. In theory Dovecot has a plugin system that we might have been able to use for this; in practice, figuring out the plugin API seemed likely to be at least as much work as hacking the Dovecot source directly.)

Sidebar: Limited LISTs

IMAP LIST commands can be limited in two ways, both of which have more or less the same effect for us:

m LIST "" "mail/*"
m LIST "mail/" "*"

For information on what the arguments to the basic LIST command mean, I will refer you to the IMAP RFC. The extended form is discussed in RFC 5819 and is based on things from, I believe, RFC 5258. See also RFC 6154 and here for the special-use stuff.

(The unofficial IMAP protocol wiki may be something I'll be consulting periodically now that I've stumbled over it, eg this matrix of all of the IMAP RFCs.)

DovecotIndexesAndLIST written at 01:53:22; Add Comment

2018-03-14

Why Let's Encrypt's short certificate lifetimes are a great thing

I recently had a conversation on Twitter about what we care about in TLS certificate sources, and it got me to realize something. I've written before about how our attraction to Let's Encrypt has become all about the great automation, but what I hadn't really thought about back then was how important the short certificate lifetimes are. What got me to really thinking about it was a hypothetical; suppose we could get completely automatically issued and renewed free certificates but they had the typical one or more year lifetime of most TLS certificates to date. Would we be interested? I realized that we would not be, and that we would probably consider the long certificate lifetime to be a drawback, not a feature.

There is a general saying in modern programming to the effect that if you haven't tested it, it doesn't work. In system administration, we tend towards a modified version of that saying; if you haven't tested it recently, it doesn't work. Given our generally changing system environments, the recently is an important qualification; it's too easy for things to get broken by changes around them, so the longer it's been since you tried something, the less confidence you can have in it. The corollary for infrequent certificate renewal is obvious, because even in automated systems things can happen.

With Let's Encrypt, we don't just have automation; the short certificate lifetime insures that we exercise it frequently. Our client of choice (acmetool) renews certificates when they're 30 days from expiring, so although the official Let's Encrypt lifetime is 90 days, we roll over certificates every sixty days. Having a rollover happen once every two months is great for building and maintaining our confidence in the automation, in a way that wouldn't happen if it was once every six months, once a year, or even less often. If it was that infrequent, we'd probably end up paying attention during certificate rollovers even if we let automation do all of the actual work. With the frequent rollover due to Let's Encrypt's short certificate lifetimes, they've become things we trust enough to ignore.

(Automatic certificate renewal for long duration certificates is not completely impossible here, because the university central IT has already arranged for free certificates for the university. Right now they're managed through a website and our university-wide authentication system, but in theory there could be automation for at least renewals. Our one remaining non Let's Encrypt certificate was issued through this service as a two year certificate.)

LetsEncryptDurationGood written at 01:24:45; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.