Exim's (log) identifiers are basically unique on a given machine

October 22, 2014

Exim gives each incoming email message an identifier; these look like '1XgWdJ-00020d-7g'. Among other things, this identifier is used for all log messages about the particular email message. Since Exim normally splits information about each message across multiple lines, you routinely need to reassemble or at least match multiple lines for a single message. As a result of this need to aggregate multiple lines, I've quietly wondered for a long time just how unique these log identifiers were. Clearly they weren't going to repeat over the short term, but if I gathered tens or hundreds of days of logs for a particular system, would I find repeats?

The answer turns out to be no. Under normal circumstances Exim's message IDs here will be permanently unique on a single machine, although you can't count on global uniqueness across multiple machines (although the odds are pretty good). The details of how these message IDs are formed are in the Exim documentation's chapter 3.4. On most Unixes and with most Exim configurations they are a per-second timestamp, the process PID, and a final subsecond timestamp, and Exim takes care to guarantee that the timestamps will be different for the next possible message with the same PID.

(Thus a cross-machine collision would require the same message time down to the subsecond component plus the same PID on both machines. This is fairly unlikely but not impossible. Exim has a setting that can force more cross-machine uniqueness.)

This means that aggregation of multi-line logs can be done with simple brute force approaches that rely on ID uniqueness. Heck, to group all the log lines for a given message together you can just sort on the ID field, assuming you do a stable sort so that things stay in timestamp order when the IDs match.

(As they say, this is relevant to my interests and I finally wound up looking it up today. Writing it down here insures I don't have to try to remember where I found it in the Exim documentation the next time I need it.)

PS: like many other uses of Unix timestamps, all of this uniqueness potentially goes out the window if you allow time on your machine to actually go backwards. On a moderate volume machine you'd still have to be pretty unlucky to have a collision, though.

Written on 22 October 2014.
« Some numbers on our inbound and outbound TLS usage in SMTP
The clarity drawback of allowing comparison functions for sorting »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Oct 22 00:20:33 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.