2014-10-22
Exim's (log) identifiers are basically unique on a given machine
Exim gives each incoming email message an identifier; these look like '1XgWdJ-00020d-7g'. Among other things, this identifier is used for all log messages about the particular email message. Since Exim normally splits information about each message across multiple lines, you routinely need to reassemble or at least match multiple lines for a single message. As a result of this need to aggregate multiple lines, I've quietly wondered for a long time just how unique these log identifiers were. Clearly they weren't going to repeat over the short term, but if I gathered tens or hundreds of days of logs for a particular system, would I find repeats?
The answer turns out to be no. Under normal circumstances Exim's message IDs here will be permanently unique on a single machine, although you can't count on global uniqueness across multiple machines (although the odds are pretty good). The details of how these message IDs are formed are in the Exim documentation's chapter 3.4. On most Unixes and with most Exim configurations they are a per-second timestamp, the process PID, and a final subsecond timestamp, and Exim takes care to guarantee that the timestamps will be different for the next possible message with the same PID.
(Thus a cross-machine collision would require the same message time down to the subsecond component plus the same PID on both machines. This is fairly unlikely but not impossible. Exim has a setting that can force more cross-machine uniqueness.)
This means that aggregation of multi-line logs can be done with
simple brute force approaches that rely on ID uniqueness. Heck, to
group all the log lines for a given message together you can just
sort
on the ID field, assuming you do a stable sort so that things
stay in timestamp order when the IDs match.
(As they say, this is relevant to my interests and I finally wound up looking it up today. Writing it down here insures I don't have to try to remember where I found it in the Exim documentation the next time I need it.)
PS: like many other uses of Unix timestamps, all of this uniqueness potentially goes out the window if you allow time on your machine to actually go backwards. On a moderate volume machine you'd still have to be pretty unlucky to have a collision, though.