Wandering Thoughts archives

2023-03-13

What I like using Grafana Loki for (and where I avoid it)

These days we have a Grafana Loki server that collects system logs from our Linux servers (which has sometimes been an exciting learning experience), along with our long standing central syslog server and, of course, the system logs on servers themselves (both in the systemd journal and the files written to /var/log by syslog and programs like Exim). As I've written before, we have both because Loki doesn't duplicate our central syslog server, but that old entry sort of begs the question of when I use Grafana Loki instead of looking at another source of logs.

When I've wound up reaching for Grafana Loki is in two cases. The most common case is when I want a quick, narrow look at some log messages, either for a particular host or across all of our servers. Frequently, I'm looking for only a few messages; I might want to see if a particular sort of error has been logged by a local script recently, for example. Although LogQL has its peculiarities, I've internalized enough of it that I can toss off and refine this sort of ad-hoc query on the fly:

{syslog_identifier="rsync-cssite", level!="info"}

(I do these through Grafana's 'Explore' mode, hooked up to a Loki datasource.)

Loki's big advantage over grep'ing the logs on our central syslog server is that it's easy to write a query that runs rapidly; a secondary advantage is that results from the query will typically have a lot of metadata that can be used to further narrow the query down so I get only what I'm interested in. Part of that query speed is that Loki makes it easy to search over a narrow or a broad time range on an ad-hoc basis, and the Explore interface to Loki in Grafana has a convenient timeline view that shows concentrations of messages at different levels over the time range, allowing me to narrow right down if some area has unusual message levels or an unusual amount of messages.

One important thing for fast queries is to use labels to refine things as much as possible. The more narrow you can be in your labels, the less log text Loki has to search. This is especially useful if you have some low frequency labels. However, even if I have to make a query with no label restrictions that I filter with the LogQL equivalent of grep patterns, I find that Loki is often faster than grep in practice (and using Loki has the advantage that I'm only affecting the Loki server). Part of this is that it's natural to query on a narrow time range (at least to start with), which cuts down how much data Loki has to search and so speeds things up a lot.

The less common case is if I want to derive some information by parsing and summarizing the logs in some way, as I did in getting things from our ntpdate logs. This is a much less frequent thing because it's much more work, but LogQL usually makes it easier than using grep and awk on our text logs. I haven't done very much ad-hoc information extraction, although I did build some dashboard panels that draw from Loki in our Grafana setup.

What I find Loki not useful for is for any search where there's going to be a lot of log messages or any time when I want a broad overview of things. Paging through a syslog text file or the output of 'journalctl --since ...' is much easier than trying to look at the roughly equivalent data in Grafana's Explore interface (or in logcli). This is often true even if I have to use grep to narrow things down a bit.

Related to this, I've found that Loki is not particularly great for watching logs live. I would almost always much rather use 'tail -f' on our central log server, or some version of 'journalctl -f' on a particular host. The interface in Explore for live logs is clumsy and not ideal, and logcli has its own issues (some of which I believe are outright bugs, but my experiences with past bug reports to Grafana haven't encouraged me to submit more).

(I'm aware that Loki can do clever things like derive metrics from logs and trigger alerts based on things seen in logs, but so far we haven't tried to do anything in this area.)

sysadmin/GrafanaLokiWhatILikeItFor written at 22:45:50; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.