Some thoughts on structured logging, especially in and for databases

January 19, 2014

In a comment on my entry on having an audit log, dozzie recommended structured logging. Roughly speaking, instead of just logging formatted text message you log all of the various pieces of the same information in a way that's explicitly labeled, encoded, and serialized somehow. These days, for example, you might encode things as JSON.

In part this is a great ongoing debate between two sides of logging. The advantage of pure textual logs (whether split into database columns or smashed together in one line) is that they are easy to create and (much) easier to read without additional tools. The advantage of structured logs is that they are easier to analyse reliably, especially over time (ie, over the same time when a text message might change format repeatedly). They have the drawback that they need tools to make them comprehensible; the more information you load into structured log messages, the more you need one.

(Encoding to JSON and then pretty-printing it doesn't work once you're putting enough information into log messages that the JSON dump of a single one is twenty or thirty lines. Take it from a sysadmin; I'm just not going to be able to do very much by staring at that much data.)

What this does is it makes textual logging the simple, low effort way. Yes, it might be best to have comprehensive structured logging and a suite of tools and code to work with those logs, but in real life that may not be the choice you get because of the perfection trap. Text logging is better than no logging.

Databases add complications to this in two ways, one for using database for log storage and one for recording DB-related things. If you're recording your audit logs in a database (as I am), you have to worry about the audit table schema. One big issue is what information is common enough to be broken out into separate structured fields versus slammed into what are basically blob fields (for text or serialized structured data). Separate fields allow in-database searching and filtering (at least in ordinary DB engines that don't, eg, look inside JSON blobs for you) but they exact a cost in flexibility and possibly schema complexity. At the extreme you probably don't want your audit table schema to have fifty fields, most of them unused in most records, just so that you can copy specific fields from other DB tables as you make audit entries.

The other side is actually recording database entries. At one level this looks simple in structured logging; if you're logging changes to a database, you can just JSON-encode the whole DB record you're affecting and then record it (perhaps in before and after versions). But this is not sufficient because of foreign key relationships. Your straightforward JSON encoding may capture, say, 'sponsor 385', but the sponsor column is a foreign key and when you're looking at the logs (much) later you don't necessarily know who sponsor 385 was. At the best you would have to painstakingly replay the audit logs to reconstruct the state of the database at the time so you can figure out who it was. At the worst case the information is simply unrecoverable. So for real logging you need to peer through things like foreign key relations to capture the important extra data in your log records. In some schemas you may need to peer through multiple levels of relationships.

Text based logging makes this explicit because you see the message right in front of you. If you write a log message that says in part 'sponsor 385', everyone who reads the message is going to immediately ask you who that is. You get your nose rubbed in to the issue and you have to explicitly consider how to make the information you're logging comprehensible by itself.

Written on 19 January 2014.
« Your web application should have an audit log
A thought about the popularity of server-side JavaScript »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jan 19 03:36:57 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.