Good and bad formats for your log messages

November 4, 2012

So your program logs things. That's good. But does it log things in a useful format? As a system administrator who winds up reading program logs, I have certain opinions on this.

The best log format has things in fixed fields, so you can always look at field 3 or field 5 or whatever to get it (and it tries very hard to have whitespace-separated fields with no other whitespace, because this is the easiest format for basically all tools). The very best logs do as much as possible to put the common information in the same field even in different log messages, because this makes cross-message analysis much easier.

(There are other mistakes you should avoid here if you're aiming for really great log messages.)

The next best log format has lines with variable fields but puts the fields in 'key=value' format (still whitespace-separated with no interior whitespace). This makes it relatively easy to extract the fields you want with 'egrep -o' (or in various other, cruder ways).

The not so good log format at least keeps things at constant offsets from the keys; you might have 'key1=A B key2=...' when you want B. Matching and extracting B will not make you entirely happy with the authors of the software, but it's still possible to do without tearing your hair out or writing custom matching code. Program authors should avoid this approach; please just add a second key entirely for the B information. Yes, it slightly lengthens your log lines, but it makes dealing with those lines easier.

The worst log formats have variable offsets from a (variable) key position. If you want B, the log might have all of 'key=B', 'key=A B', and 'key=A C B'. If you are ever tempted to have your program output things in this format, don't. Use explicit keys for all of the elements; it makes working with your logs much easier.

In related news, I do not like Exim's log format for hostnames in SMTP server related log messages. Exim logs hostnames as follows, more or less:

H=<id> <optional-HELO> [<IP>]

(With other key=value fields after and possibly before the H= connection information.)

<IP> is the IP address (the [ and ] are literal). <id> is either the (verified, I hope) DNS hostname or '(<IP>)' if the hostname could not verify. <optional-HELO> is '(<HELO name>)' if the remote machine EHLO'd or HELO'd with something other than what Exim considers a valid HELO name for the machine (which I believe is either the verified hostname or '[<IP>]', the IP address literal HELO name). If the machine used a valid HELO name, the field isn't present at all.

From recent personal experience I can testify that this makes extracting the IP address of a connection rather more work than you'd like, and probably less reliable to boot.

Written on 04 November 2012.
« Another go-around on the drawbacks and balances of automation
Your logs should always include IP addresses (in addition to hostnames) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Nov 4 01:26:06 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.