In search of modest scale structured syslog analysis

September 30, 2016

Every general issue should start from a motivating usage case, so here's ours: we want to be able to find users who haven't logged with SSH or used IMAP in the past N months (this perhaps should include Samba authentication as well). As a university department that deals with graduate students, postdocs, visiting researchers, and various other sorts of ongoing or sporadic collaborations, we have a user population that's essentially impossible to keep track of centrally (and sometimes at all). So we want to be able to tell people things like 'this account that you sponsor doesn't seem to have been used for a year'.

As far as I can tell from Internet searches and so on, there are an assorted bunch of log aggregation, analysis, and querying tools. Logstash is the big one that many people have heard of, but then there's Graylog and fluentd and no doubt others. In theory any of these ought to be the solution to our issue. In practice, there seem to be two main drawbacks:

  • They all seem to be designed for large to very large environments. We have what I tend to call a midsized environment; what's relevant here is that we only have on the order of 20 to 30 servers. Systems designed for large environments seem to be both complicated and heavyweight, requiring things like JVMs and multiple servers and so on.

  • None of them appear to come with or have a comprehensive set of parsers to turn syslog messages from various common programs into the sort of structured information that these systems seem designed to work with. You can write your own parsers (usually with regular expressions), but doing that well requires a relatively deep knowledge of just what messages the programs can produce.

(In general all of these systems feel as if they're primarily focused on application level logging of structured information, where you have your website or backend processing system or whatever emit structured messages into the logging backend. Or perhaps I don't understand how you're supposed to use these systems.)

We can undoubtedly make these systems solve our problem. We can set up the required collection of servers and services and get them talking to each other (and our central syslog server), and we can write a bunch of grok patterns to crack apart sshd and Dovecot and Samba messages. But all of this feels as if we are using the back of a large and very sharp axe to hammer in a small nail. It works, awkwardly, but it's probably not the right way.

It certainly feels as if structured capturing and analysis of syslog messages from common programs like sshd, Dovecot, and so on in a moderate sized environment ought to be a well solved problem. We can't be the first people to want to do this, so this particular wheel must have been reinvented repeatedly by now. But I can't find even a collection of syslog parsing patterns for common Unix daemons, much less a full system for this.

(If people know of systems or resources for doing this, we would of course be quite interested. There are some SaaS services that do log analysis for you, but as a university department we're not in a position to pay for this (staff time is free, as always).)

Written on 30 September 2016.
« Making systemd-networkd really skip trying IPv6 on your networks
Some git repository manipulations that I don't know how to do well »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Sep 30 01:30:15 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.