Topic: Prometheus and Grafana

This collects most or all of the entries I've written on Prometheus and Grafana, in reverse chronological order. You can also see the overall index of entries (or the chronological index).

2024-06-22: A Prometheus Blackbox gotcha: (UDP) DNS replies have a low size limit
2024-06-13: Using prime numbers for our Prometheus scrape intervals
2024-06-11: The size of our Prometheus setup as of June 2024
2024-05-27: Some notes on Grafana Loki's new "structured metadata" (as of 3.0.x)
2024-05-22: The Prometheus host agent's 'perf' collector can be kind of expensive
2024-04-02: An issue with Alertmanager inhibitions and resolved alerts
What Prometheus Alertmanager's group_interval setting means
2024-03-30: The Prometheus scrape interval mistake people keep making
2024-03-28: The effects of silences (et al) in Prometheus Alertmanager
2024-03-26: How I would automate monitoring DNS queries in basic Prometheus
2024-03-25: Options for diverting alerts in Prometheus
2024-03-15: The problem of using basic Prometheus to monitor DNS query results
2024-03-01: Options for your Grafana panels when your metrics change names
2024-02-28: Detecting absent Prometheus metrics without knowing their labels
2024-01-29: What I think goes wrong periodically with our Grafana Loki on restarts
2024-01-21: The expected size of a gap in a Prometheus range vector (sometimes)
2024-01-20: An example of how Prometheus's delta() function will extrapolate time ranges
2024-01-16: What Prometheus exporters we use (as of the end of 2023)
2023-12-28: The various phases of Prometheus Blackbox's HTTP probe
2023-12-17: Prometheus's group_left() and group_right() operators
2023-12-10: Some notes on using the logcli program to query Grafana Loki
2023-12-09: I recently used Grafana Loki for fast, flexible log searching
2023-11-28: Why we scrape Prometheus Blackbox's metrics endpoint
2023-10-27: Alerting on sticky configuration reload failures for Prometheus
2023-08-03: Prometheus scrape failures can cause alerts to be 'resolved'
2023-08-02: The Prometheus host agent's metrics for systemd unit restarts
2023-07-29: Prometheus Blackbox probes and DNS lookups
2023-06-04: Why Prometheus exporters really need fixed TCP ports
2023-05-27: How I set up a server for testing new Grafana versions and other things
2023-05-26: In practice, Grafana has not been great at backward compatibility
2023-04-27: I can't recommend serious use of an all-in-one local Grafana Loki setup
2023-04-08: A Prometheus Alertmanager alert grouping conundrum
2023-03-19: Easily adjusting the minimum interval on panels in Grafana dashboards
2023-03-13: What I like using Grafana Loki for (and where I avoid it)
2023-02-21: Grafana Loki doesn't compact log chunks and what this means for you
2023-02-07: What I want in Prometheus (as a whole) is aggregating alert notifications
2023-02-05: Some things on Prometheus's new feature to keep alerts firing for a while
2023-02-02: A gotcha when making partial copies of Prometheus's database with rsync
2023-01-21: How Prometheus makes good use of the HTTP Accept: header
2023-01-03: Some thoughts on Prometheus Alertmanager's alert reminders
2022-12-21: The Prometheus cardinality issues with systemd unit-related metrics
2022-12-14: Your options for displaying status over time in Grafana 9
2022-12-11: Prometheus Blackbox 0.23.0 has added a nice improvement to its DNS checks
2022-10-21: The Prometheus timestamp() function can be used on expressions, sort of
2022-10-11: When Promtail seems to make position checkpoints (as of v2.6.1)
2022-09-30: Your Grafana Loki setup needs security and access control
2022-09-14: Grafana Loki doesn't duplicate a central syslog server (or vice versa)
2022-09-12: What's lost when running the Prometheus host agent as a non-root user on Linux
2022-09-08: Grafana's problem with the order of dashboard panel legends and Prometheus
2022-09-06: Machine room temperatures and the value of long Prometheus metrics history
2022-09-03: Our Prometheus host metrics saved us from some painful experiences
2022-09-02: An rsyslog(d) syslog forwarding setup for Grafana Loki (via Promtail)
2022-08-28: Getting USB TEMPer2 temperature sensor readings into Prometheus (on Linux)
2022-08-22: Some notes on Grafana annotations sourced from Prometheus metrics
2022-08-19: I wish Prometheus had a table-driven label remapping feature
2022-08-12: My adventure with URLs in a Grafana that's behind a reverse proxy
2022-08-08: Two example Grafana Loki log queries to get things from ntpdate logs
2022-07-31: Using Prometheus's recent '@ end()' PromQL feature to reduce graph noise
2022-07-26: To be fully useful, Prometheus histograms want their cumulative sums
2022-07-21: You can sensibly move or copy Prometheus's database with rsync
2022-07-19: We won't be sending systemd logs to Grafana Loki in JSON format
2022-07-18: Grafana Loki and what can go wrong with label cardinality
2022-06-16: I wish Grafana dashboards and panels could have easy, natural comments
2022-06-06: Doing a selective alert about a host's additional exporters in Prometheus
2022-06-05: Checking a few metrics (time series) at once in Prometheus's query language
2022-05-28: It's a bit risky to give people access to your Prometheus Blackbox exporter
2022-05-06: Filtering Prometheus metrics with deliberately repeated labels
2022-03-20: Prometheus: using gauge-like things as if they were counters
2022-03-09: Linux disk names you can encounter in your Prometheus host metrics
2022-03-07: The convenience of multi-purpose monitoring (in Prometheus)
2022-02-04: Some notes on Grafana relative time ranges
2022-01-15: You should do lint checks on your Prometheus alert (and recording) rules
2022-01-14: Link: Histograms in Grafana (a howto)
2022-01-11: Some things about Prometheus Alertmanager's notification metrics
2022-01-10: The complexity of seeing if your Prometheus Alertmanager is truly healthy
2021-11-30: Prometheus will make persistent connections to agents (scrape targets)
2021-09-30: Moving averages (and rates) for metrics in Prometheus (and Grafana)
2021-09-11: How many Prometheus metrics a typical host here generates
2021-09-04: Adding a "host" label to all of your per-host Prometheus metrics
2021-09-03: How I try out and test new versions of Grafana
2021-08-13: Prometheus alerts and the idea of "deadbands" (or maybe hysteresis) (with an implementation)
2021-08-12: Prometheus, Alertmanager, and maybe avoiding flapping alerts
2021-08-04: When you do and don't get stuck query results from a down Prometheus
2021-06-29: Monitoring the status of Linux network interfaces with Prometheus
2021-06-17: In Prometheus queries, on and ignoring don't drop labels from the result
2021-05-27: Some thoughts on having set up a personal Alertmanager instance
2021-05-22: I don't know how much memory our Prometheus setup needs
2021-05-15: The size of our Prometheus setup as of May 2021
2021-04-12: Counting how many times something started or stopped failing in Prometheus
2021-04-04: Some uses for Prometheus's resets() function
2021-03-31: Understanding Prometheus' changes() function and what it can do for me
2021-03-14: I wish Prometheus had some features to deal with 'missing' metrics
2021-03-13: Prometheus and the case of the stuck metrics
2021-02-24: How convenience in Prometheus labels for alerts led me into a quiet mistake
How (and where) Prometheus alerts get their labels
2021-02-22: How I set up testing alerts in our Prometheus environment
2021-01-10: What timestamps you get back along with Prometheus query results
2021-01-08: How to extract raw time series data from Prometheus
2020-12-30: Some ways to do a Prometheus query as of a given time
A Prometheus wish: easy ways to evaluate a PromQL query at a given time
2020-12-15: How to make Grafana properly display a Unix timestamp
In Prometheus, it's hard to work with when metric points happened
2020-12-02: Prometheus 2.23.0 now lets you display graphs in local time
2020-11-17: Grafana and the case of the infinite serial number
2020-10-31: A gotcha with combining single-label and multi-label Prometheus metrics
2020-10-17: A potential Prometheus issue for labeled metrics for infrequent events
2020-08-18: The Prometheus host agent can disturb Linux CPU frequency measurements
2020-08-07: How we choose our time intervals in our Grafana dashboards
2020-07-14: Link: The Anatomy of a PromQL Query
2020-06-29: How Prometheus Blackbox's TLS certificate metrics would have reacted to AddTrust's root expiry
2020-06-25: What Prometheus Blackbox's TLS certificate expiry metrics are checking
2020-06-05: Why we put alert start and end times in our Prometheus alert messages
2020-06-04: Formatting alert start and end times in Prometheus Alertmanager messages
2020-05-22: Working out how frequently your ICMP pings fail in Prometheus
2020-03-30: Notes on Grafana 'value groups' for dashboard variables
2020-03-28: The Prometheus host agent's CPU utilization metrics can be a bit weird
2020-03-19: Make sure to keep useful labels in your Prometheus alert rules
2020-02-29: OpenBSD versus Prometheus (and Go)
2020-02-27: Some alert inhibition rules we use in Prometheus Alertmanager
2020-02-26: The magic settings to make a bar graph in Grafana
2020-01-26: How big our Prometheus setup is (as of January 2020)
2020-01-05: Why I prefer the script exporter for exposing script metrics to Prometheus
2020-01-04: Three ways to expose script-created metrics in Prometheus
2019-12-30: The history and background of us using Prometheus
2019-12-29: Prometheus and Grafana after a year (more or less)
2019-12-28: Our setup of Prometheus and Grafana (as of the end of 2019)
2019-12-02: You can have Grafana tables with multiple values for a single metric (with Prometheus)
Calculating usage over time in Prometheus (and Grafana)
2019-11-30: Counting the number of distinct labels in a Prometheus metric
2019-11-25: In Prometheus, don't be afraid of high cardinality metrics if they're valuable enough
2019-10-08: How we implement reboot notifications when our machines reboot in Prometheus
2019-09-17: Finding metrics that are missing labels in Prometheus (for alert metrics)
2019-09-02: Another way to do easy configuration for lots of Prometheus Blackbox checks
2019-08-26: A lesson of (alert) scale we learned from a power failure
2019-07-28: A note on using the Go Prometheus client package to exposed labeled metrics
2019-06-28: Using Prometheus's statsd exporter to let scripts make metrics updates
2019-06-02: Exploring the start time of Prometheus alerts via ALERTS_FOR_STATE
2019-05-20: Understanding how to pull in labels from other metrics in Prometheus
2019-05-03: Some implications of using offset instead of delta() in Prometheus
2019-04-28: A gotcha with stale metrics and *_over_time() in Prometheus
2019-04-26: Brief notes on making Prometheus instant queries with curl
2019-04-21: My view on upgrading Prometheus (and Grafana) on an ongoing basis
2019-04-18: A pattern for dealing with missing metrics in Prometheus in simple cases
2019-04-13: Remembering that Prometheus expressions act as filters
2019-03-24: Prometheus's delta() function can be inferior to subtraction with offset
2019-03-18: Prometheus subqueries pick time points in a surprising way
2019-03-12: An easy optimization for restricted multi-metric queries in Prometheus
2019-03-11: Testing Prometheus alert conditions through subqueries
2019-03-10: What the default query step is for Prometheus subqueries
2019-03-06: Using Prometheus subqueries to look for spikes in rates
2019-02-27: Using Prometheus subqueries to do calculations over time ranges
2019-02-17: Some notes on heatmaps and histograms in Prometheus and Grafana
2019-01-23: A little surprise with Prometheus scrape intervals, timeouts, and alerts
2018-12-14: Why our Grafana URLs always require HTTP Basic Authentication
2018-12-12: One situation where you absolutely can't use irate() in Prometheus
2018-12-03: Linux disk IO stats in Prometheus
2018-11-25: How we monitor our Prometheus setup itself
2018-11-20: When Prometheus Alertmanager will tell you about resolved alerts
2018-11-11: Easy configuration for lots of Prometheus Blackbox checks
2018-11-10: Why Prometheus turns out not be our ideal alerting system
2018-11-09: Getting CPU utilization breakdowns efficiently in Prometheus
2018-11-05: rate() versus irate() in Prometheus (and Grafana)
2018-10-28: How I'm visualizing health check history in Grafana
2018-10-22: Using group_* vector matching in Prometheus for database lookups
2018-10-18: Some things on delays and timings for Prometheus alerts
2018-10-17: When metrics disappear on updates with Prometheus Pushgateway
2018-10-13: Getting a CPU utilization breakdown in Prometheus's query language, PromQL
How Prometheus's query steps (aka query resolution) work
2018-10-11: Some notes on Prometheus's Blackbox exporter


This is a Category/PageManagement page.


Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Feb 5 22:38:12 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.