Wandering Thoughts archives

2007-09-30

Understanding Exim's weird way of doing retries

First, some terminology. A top level address is an address that a message starts out being sent to; for example, every (accepted) RCPT TO in SMTP creates a top level address for the message. A destination is a place that a message is ultimately going to be delivered to, and may include things like files. A top level address may turn into more than one destination through means like .forward files, aliases, and mailing list files. At a conceptual level, all MTAs have two main jobs: mapping top level addresses to destinations, and then delivering to destinations.

The MTA we use now takes what I think of as the straightforward two-phase approach to this. Top level addresses are mapped to destinations in a process called 'routing', the MTA tries to deliver to all destinations, and any destinations that had some temporary delivery failure are remembered to be retried later. When the MTA does retries, it looks up all still undelivered destinations and tries to deliver to them again.

(This ignores retry times, bounces, incomplete DNS lookups, and so on.)

The important effect of the two-phase approach is that a given message's destinations never change; they are determined once when it is received and then frozen.

Exim does not operate this way. Instead of remembering undelivered destinations and retrying them directly, it remembers delivered destinations and whether or not a top level address was completely delivered. Then each time exim retries a message, it re-maps any top level address that has not been completely delivered to destinations, throws away any destination that has already been delivered to, and (re)attempts deliveries on any remaining destinations. (If there are no remaining destinations, the top level address is done.)

So a given message's destinations can change during retries. The consequence of this is that messages being retried will pick up changes to .forwards, file-based mailing lists, and so on; as the Exim documentation notes, this can result in things like new subscribers to a mailing list receiving messages that were sent to the mailing list before they actually subscribed.

(Exim has a one_time option for redirect-based routers that will turn destination addresses into top-level addresses. But because top-level addresses have to be real addresses, Exim has to outlaw pipe and file destinations if you turn this on, and this is not an option for us.)

This approach does let you correct a routing problem on the fly; you don't need to change a routing rule and then manually change the destinations of a pile of stalled messages. But it makes it hard to see what destination is causing messages to stall (and what the error message is), since undelivered destinations only exist during retries; mailq and so on will only tell you what top level addresses haven't been completely delivered (and what destinations have been delivered).

(Technically the information can be dug out of the logs with sufficient work.)

(This is one of those entries I write to make sure that I understand the issue myself.)

UnderstandingEximRetries written at 23:16:53; Add Comment

2007-09-24

Assume the existence of folklore among your users

One thing I assume about how our users deal with the local computing environment is that there exists a significant body of oral traditions and folklore around (at least) the graduate students. When you think about it, this is inevitable; when new graduate students show up they are going to ask the people around them for help, and that's the older graduate students.

(This gives you oral tradition; you get folklore because there's no guarantee that the information the older graduate students will pass on is correct or current.)

It's important to remember this when planning changes, because we can't assume that changes in how to do things will propagate around the grad student population instantly. The new ways will have the advantage of being written up in various places and working (hopefully), but they are still going to have to fight to displace old traditions, and it will probably take significant time.

(And your new ways had better actually work, and effectively, or they do not really have a fighting chance. If the new procedures are in practice ineffective, they are probably not going to displace the old ones, most especially if the old ones work at least as well as the new ones.)

I also think that it's useful to try to work out just what oral traditions are circulating around. It's hard to get good answers by just asking people, so I think you mostly need to infer the folklore from the odd things that your users do.

One reason that working out existing folklore is useful is a corollary to the existence of folklore: the less the folklore has to change to be correct, the easier it is to do it.

UserFolklore written at 23:13:00; Add Comment

2007-09-23

Names are not cheap

A recent sysadmin discussion here wound up with one of the participants suggesting that we deal with a particular issue by creating a new hostname, because after all names are cheap. This remark crystallized something for me.

I disagree. Names are not cheap, they are actually expensive. Names look cheap because they are easy to create, but they create clutter and uncertainty (over what name is still being used by what) that makes them expensive in the long run.

(Once you have enough names and enough uncertainty, you become overwhelmed; cleaning anything up requires an extraordinary amount of work, so you just pile more names on top without even bothering to try to figure out if you really need them. Maybe you could reuse an existing name, and maybe not; it is simpler and faster just to assume you can't.)

This holds for all sorts of names, not just hostnames. I've also seen it in at least Ethernet addresses in DHCP registrations and in Unix logins that aren't assigned to specific identifiable people (whether they are for bits of software, collaborative projects, or generic visitor accounts).

(The bad effects of clutter aren't restricted to just names, of course. Many things can become so cluttered up that they're all but impossible to understand, so the only way to proceed is to try to build your own addition on the side.)

ExpensiveNames written at 22:39:55; Add Comment

2007-09-12

Mass scanning via POP3

For our backwards compatibility sins, we have plain POP3 exposed to the general Internet. This morning, we discovered that our POP3 server had more or less locked up around 6:30 am, apparently because someone decided to open up hundreds of connections to it and perform a brute force mass account/password attempt.

(In fact it took down our IMAP server too, because we're running Dovecot. This is one of the annoying limitations of Dovecot; there is a central server process that has a couple of file descriptors for every active connection. Run out of file descriptors here and everything grinds to a halt.)

In retrospect, this makes a great deal of sense for the attacker, because POP3 is a great protocol to go after. Not only do you get to try brute-forcing plaintext passwords, but you don't even have to spend any time and effort on handling an encryption protocol the way you do with mass ssh scanning; it's all in plain ASCII. Some testing suggests that you can even pipeline the commands.

(You still need many parallel connections, because Dovecot pauses for a while after a failed password even thought this doesn't do anything effective.)

I've found myself wondering if the attacker might have been a would be spammer wanting to mine people's email for highly useful live addresses instead of a cracker looking for logins, since there is no guarantee that a POP3 account will get you a real login.

MassPOP3Attack written at 23:39:52; Add Comment

2007-09-09

Rethinking my views of Fibrechannel

The more I look at Fibrechannel's competitors, the more attractive Fibrechannel looks. After all, FC's only real problem is that it costs a lot of money, and even that may be changing as FC vendors come to their senses; we recently got a quite reasonable quote from a storage vendor.

(Well, FC switch vendors are still insane, but if you are willing to go for second-hand equipment you can apparently get 1 and 2 gigabit FC switches for quite cheap as people upgrade to 4G.)

Admittedly, I might not feel this way if we didn't already have a FC-based SAN that has been solidly reliable and performs well. That gives me certain biases, especially when I have been in the grimy depths of equipment evaluations.

(Also, we can't just continue our existing FC SAN; we need to move from our highly customized Solaris 8 on SPARC setup to Solaris 10 on x86.)

RethinkingFC written at 12:19:08; Add Comment

2007-09-06

When you don't want RAID-5

Here's a paradox that we only realized recently: in some situations, using RAID-5 can be less reliable overall than using no RAID at all.

This comes about because while RAID-5 preserves your data over a single-drive failure, it loses all your data if there is ever a double drive failure.

Our specific case is a disk-based incremental backup system. Right now a day's backups take up about a third of a disk, and we have it set up so each day's backups go to a different disk (eventually cycling around). The older a backup is the less useful it is. If we lose the disk with yesterday's incrementals we will still have several previous days, so we are not too bad off even after the worst single disk failure (and losing older disks is less damaging). If we lose two disks we are much better off than with RAID-5, since we still have all the remaining backups and thus can (at worst) get back three days ago.

And of course, not using RAID-5 gets us three more days of online incrementals.

(This is not our only backup system; we do less frequent backups to tape. These have the full backups that serve as the baseline for the incrementals.)

What makes this situation work is that losing some of the data is not really a fatal thing while losing all of the data would be fairly alarming, combined with the fact that we can fit each 'unit' of data on a single disk.

SkippingRAID5 written at 23:13:14; Add Comment

2007-09-02

I wish mailers had a real programming language

I really wish that the authors of mail transport agents would give me an actual language to program the behavior of their mailers. Well, not the low-level behavior; what I want is a high-level language I can use to cleanly write out the logic of what should happen to messages, the sequence of checks and actions and destinations that a particular message should go through.

I feel that our message flow logic is not particularly complicated, when written out directly. The problem is that as far as I can see, current mailers don't let me actually do that; instead I am forced to glue the logic together indirectly through the mailer's (limited) configuration system. Effectively the mailer's configuration system becomes an extremely peculiar assembly language.

This is more than an annoyance; it is a danger. When the logic is only expressed indirectly, it's harder to see what it actually is and thus harder to make sure that your configuration really works the way you want it to. (This should be no surprise, since it is one of the general drawbacks of assembly languages.)

This is all the more frustrating with mailers like exim, which already more or less have an embedded programming language, just at too low a level to do this. They're so close yet so far, and at the same time the best choice I have.

ProgrammableMailers written at 22:57:14; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.