Topic: Prometheus and Grafana

This collects most or all of the entries I've written on Prometheus and Grafana, in reverse chronological order. You can also see the overall index of entries (or the chronological index).

2020-10-17: A potential Prometheus issue for labeled metrics for infrequent events
2020-08-18: The Prometheus host agent can disturb Linux CPU frequency measurements
2020-08-07: How we choose our time intervals in our Grafana dashboards
2020-07-14: Link: The Anatomy of a PromQL Query
2020-06-29: How Prometheus Blackbox's TLS certificate metrics would have reacted to AddTrust's root expiry
2020-06-25: What Prometheus Blackbox's TLS certificate expiry metrics are checking
2020-06-05: Why we put alert start and end times in our Prometheus alert messages
2020-06-04: Formatting alert start and end times in Prometheus Alertmanager messages
2020-05-22: Working out how frequently your ICMP pings fail in Prometheus
2020-03-30: Notes on Grafana 'value groups' for dashboard variables
2020-03-28: The Prometheus host agent's CPU utilization metrics can be a bit weird
2020-03-19: Make sure to keep useful labels in your Prometheus alert rules
2020-02-29: OpenBSD versus Prometheus (and Go)
2020-02-27: Some alert inhibition rules we use in Prometheus Alertmanager
2020-02-26: The magic settings to make a bar graph in Grafana
2020-01-26: How big our Prometheus setup is (as of January 2020)
2020-01-05: Why I prefer the script exporter for exposing script metrics to Prometheus
2020-01-04: Three ways to expose script-created metrics in Prometheus
2019-12-30: The history and background of us using Prometheus
2019-12-29: Prometheus and Grafana after a year (more or less)
2019-12-28: Our setup of Prometheus and Grafana (as of the end of 2019)
2019-12-02: You can have Grafana tables with multiple values for a single metric (with Prometheus)
Calculating usage over time in Prometheus (and Grafana)
2019-11-30: Counting the number of distinct labels in a Prometheus metric
2019-11-25: In Prometheus, don't be afraid of high cardinality metrics if they're valuable enough
2019-10-08: How we implement reboot notifications when our machines reboot in Prometheus
2019-09-17: Finding metrics that are missing labels in Prometheus (for alert metrics)
2019-09-02: Another way to do easy configuration for lots of Prometheus Blackbox checks
2019-08-26: A lesson of (alert) scale we learned from a power failure
2019-07-28: A note on using the Go Prometheus client package to exposed labeled metrics
2019-06-28: Using Prometheus's statsd exporter to let scripts make metrics updates
2019-06-02: Exploring the start time of Prometheus alerts via ALERTS_FOR_STATE
2019-05-20: Understanding how to pull in labels from other metrics in Prometheus
2019-05-03: Some implications of using offset instead of delta() in Prometheus
2019-04-28: A gotcha with stale metrics and *_over_time() in Prometheus
2019-04-26: Brief notes on making Prometheus instant queries with curl
2019-04-21: My view on upgrading Prometheus (and Grafana) on an ongoing basis
2019-04-18: A pattern for dealing with missing metrics in Prometheus in simple cases
2019-04-13: Remembering that Prometheus expressions act as filters
2019-03-24: Prometheus's delta() function can be inferior to subtraction with offset
2019-03-18: Prometheus subqueries pick time points in a surprising way
2019-03-12: An easy optimization for restricted multi-metric queries in Prometheus
2019-03-11: Testing Prometheus alert conditions through subqueries
2019-03-10: What the default query step is for Prometheus subqueries
2019-03-06: Using Prometheus subqueries to look for spikes in rates
2019-02-27: Using Prometheus subqueries to do calculations over time ranges
2019-02-17: Some notes on heatmaps and histograms in Prometheus and Grafana
2019-01-23: A little surprise with Prometheus scrape intervals, timeouts, and alerts
2018-12-14: Why our Grafana URLs always require HTTP Basic Authentication
2018-12-12: One situation where you absolutely can't use irate() in Prometheus
2018-12-03: Linux disk IO stats in Prometheus
2018-11-25: How we monitor our Prometheus setup itself
2018-11-20: When Prometheus Alertmanager will tell you about resolved alerts
2018-11-11: Easy configuration for lots of Prometheus Blackbox checks
2018-11-10: Why Prometheus turns out not be our ideal alerting system
2018-11-09: Getting CPU utilization breakdowns efficiently in Prometheus
2018-11-05: rate() versus irate() in Prometheus (and Grafana)
2018-10-28: How I'm visualizing health check history in Grafana
2018-10-22: Using group_* vector matching in Prometheus for database lookups
2018-10-18: Some things on delays and timings for Prometheus alerts
2018-10-17: When metrics disappear on updates with Prometheus Pushgateway
2018-10-13: Getting a CPU utilization breakdown in Prometheus's query language, PromQL
How Prometheus's query steps (aka query resolution) work
2018-10-11: Some notes on Prometheus's Blackbox exporter

This is a Category/PageManagement page.

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Sep 19 14:53:20 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.