Wandering Thoughts archives

2012-11-04

Your logs should always include IP addresses (in addition to hostnames)

Although I was grumpy about Exim's logging yesterday, there is one thing it gets absolutely right: it always logs the IP address as well as the hostname. The importance of this goes well beyond doing reverse lookups right; even if your program is fully validating the hostname, you should log the IP address as well. In fact, my view is that either the IP address or the (verified) hostname alone is incomplete information and you should always log both. Why is quite simple:

What IP address a hostname resolves to can change over time.

Yes, right now logging just the hostname lets me determine the IP address your program saw. But that doesn't necessarily help if I'm going back to the logs a day, a week, a month, or six months later, because then I can't be sure that the IP address I'm resolving today is the same IP address that I would have looked up immediately after your program logged its message. For that matter, the hostname may have stopped resolving entirely by the time I look it up.

(In some cases hostnames can change their IP addresses much faster, including 'right after you verified the IP address'.)

The same is true of logging the IP address alone; by the time I go to look up the hostname it may have changed (especially if I'm doing this well after the fact). Logging the hostname too is especially important if your program made access decisions or whatever based on the hostname instead of the IP address, because then I really want to know what your program was using as the basis for its decision. However, if your program doesn't normally look up the hostname at all it's okay to just log the IP address; there's no need to go to extra work and extra time just to satisfy my moderate curiosity.

(Technically even the hostname and the IP address isn't really sufficient information to reconstruct 'who the IP address belonged to' at a later date; you also need to know things like which ASN it belonged to at the time and its routing and so on. But that is way more than is required in anything but the most cautious organizations. And if you care about this stuff, there are cautions.)

programming/AlwaysLogIPAddress written at 22:29:41; Add Comment

Good and bad formats for your log messages

So your program logs things. That's good. But does it log things in a useful format? As a system administrator who winds up reading program logs, I have certain opinions on this.

The best log format has things in fixed fields, so you can always look at field 3 or field 5 or whatever to get it (and it tries very hard to have whitespace-separated fields with no other whitespace, because this is the easiest format for basically all tools). The very best logs do as much as possible to put the common information in the same field even in different log messages, because this makes cross-message analysis much easier.

(There are other mistakes you should avoid here if you're aiming for really great log messages.)

The next best log format has lines with variable fields but puts the fields in 'key=value' format (still whitespace-separated with no interior whitespace). This makes it relatively easy to extract the fields you want with 'egrep -o' (or in various other, cruder ways).

The not so good log format at least keeps things at constant offsets from the keys; you might have 'key1=A B key2=...' when you want B. Matching and extracting B will not make you entirely happy with the authors of the software, but it's still possible to do without tearing your hair out or writing custom matching code. Program authors should avoid this approach; please just add a second key entirely for the B information. Yes, it slightly lengthens your log lines, but it makes dealing with those lines easier.

The worst log formats have variable offsets from a (variable) key position. If you want B, the log might have all of 'key=B', 'key=A B', and 'key=A C B'. If you are ever tempted to have your program output things in this format, don't. Use explicit keys for all of the elements; it makes working with your logs much easier.

In related news, I do not like Exim's log format for hostnames in SMTP server related log messages. Exim logs hostnames as follows, more or less:

H=<id> <optional-HELO> [<IP>]

(With other key=value fields after and possibly before the H= connection information.)

<IP> is the IP address (the [ and ] are literal). <id> is either the (verified, I hope) DNS hostname or '(<IP>)' if the hostname could not verify. <optional-HELO> is '(<HELO name>)' if the remote machine EHLO'd or HELO'd with something other than what Exim considers a valid HELO name for the machine (which I believe is either the verified hostname or '[<IP>]', the IP address literal HELO name). If the machine used a valid HELO name, the field isn't present at all.

From recent personal experience I can testify that this makes extracting the IP address of a connection rather more work than you'd like, and probably less reliable to boot.

programming/OnLogMessageFormats written at 01:26:06; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.