2022-09-30
Your Grafana Loki setup needs security and access control
Grafana Loki is a nice and easy to set up way to send your (Linux) systemd logs (and other logs too with more work) to a central log server where you can then conveniently search them and do some other things. In a simple setup, you're going to set up a single 'does everything' Loki server, install Promtail on your client machines to ship their logs to Loki, and then probably install Grafana (perhaps behind a web server frontend) and connect it to Loki so you have a convenient way to query your logs. All of this is very similar to how you might set up Prometheus with Grafana. However, you don't want to stop here, because Loki really needs some security around who can access it, often more so than the metrics systems (like Prometheus) that it resembles.
The direct reason that you want access control for Loki is that Loki provides direct access to your logs, in full and basically raw form. All of your logs, from all of the systems that you're having feed in to Loki, with all of the potentially sensitive information that might be appearing in them. In many situations, you don't want to provide this sort of log access to everyone internally and you would be much more restrictive about who had access to read the logs on, say, a central syslog server. This applies both to direct access to Loki's HTTP API endpoints and to access to Loki through, say, Grafana's 'Explore' ad-hoc query system (which is a convenient way to poke through your Loki logs in a browser, instead of using LogCLI to do it from the command line).
(Even if you (collectively) don't have any concerns about co-workers and other internal users having access to the logs, consider how much of a potential treasure trove they could be to an attacker who gains access to your internal systems. For example, such an attacker could get a great deal of information about what user accounts have (SSH) access to which systems, and how they authenticate, as well as internal processing flows. And by default all of this can be accessed via HTTP, which means that any vulnerability that allows an attacker to make HTTP requests to internal web servers can probably be used to extract logs.)
Grafana Loki perhaps makes this not entirely clear, since there's nothing in the current documentation that explicitly says that leaving a normally set up 'all in one' Loki server exposed to your intranet gives everyone on the intranet the ability to ask it for logs. And unfortunately, adding access control to Loki is not entirely easy because Loki is a 'push' system, where client machines running Promtail must be able to talk to some Loki API endpoints (and if you make them require credentials to do so, it's likely that a sufficiently privileged person or attacker on a client machine can get those credentials).
That Loki is a push based system has another effect, which is that the Loki server isn't really in control of what logs it ingests from clients the way a Prometheus server is in control of what metrics it pulls from who. Again, unless you go out of your way your Loki server will probably accept logs from anyone on your intranet who cares to ship them to you, and it will normally believe a lot of labels in those logs (such as the hostname they're allegedly from). Loki is almost certainly not the log capturing system you want to use in a hostile environment, or even in an environment where other people may copy your system configurations (complete with your setting for the Loki server).
(We wound up with a brute force Apache solution to this, which is made more complicated because of how Loki co-mingles its various API endpoints.)
PS: I don't blame the Grafana Loki developers for not addressing this. Access control is a hard problem and it's probably best solved by a frontend, which is much better placed to take on all of the various potential complexities. I do wish Loki gave you somewhat more control over log ingestion and that the documentation had a bigger warning about this issue.
PPS: System and application metrics can be potentially sensitive too, but my general sense is that they're less dangerous, partly because they're generally less specific and more aggregated. Logs are extremely specific, that's their purpose.