2025-02-08: |
The Prometheus host agent is missing some Linux NFSv4 RPC stats (as of 1.8.2)
|
2024-11-28: |
My life has been improved by my quiet Prometheus alert status monitor
|
2024-11-22: |
My new solution for quiet monitoring of our Prometheus alerts
|
2024-11-21: |
Our Prometheus alerting problem if our central mail server isn't working
|
2024-11-12: |
Finding a good use for keep_firing_for in our Prometheus alerts
|
2024-11-11: |
Prometheus makes it annoyingly difficult to add more information to alerts
|
2024-09-29: |
Brief notes on making Prometheus's SNMP exporter use additional SNMP MIB(s)
|
2024-09-27: |
Brief notes on how the Prometheus SNMP exporter's configurations work
|
2024-06-22: |
A Prometheus Blackbox gotcha: (UDP) DNS replies have a low size limit
|
2024-06-13: |
Using prime numbers for our Prometheus scrape intervals
|
2024-06-11: |
The size of our Prometheus setup as of June 2024
|
2024-05-27: |
Some notes on Grafana Loki's new "structured metadata" (as of 3.0.x)
|
2024-05-22: |
The Prometheus host agent's 'perf' collector can be kind of expensive
|
2024-04-02: |
An issue with Alertmanager inhibitions and resolved alerts
What Prometheus Alertmanager's group_interval setting means
|
2024-03-30: |
The Prometheus scrape interval mistake people keep making
|
2024-03-28: |
The effects of silences (et al) in Prometheus Alertmanager
|
2024-03-26: |
How I would automate monitoring DNS queries in basic Prometheus
|
2024-03-25: |
Options for diverting alerts in Prometheus
|
2024-03-15: |
The problem of using basic Prometheus to monitor DNS query results
|
2024-03-01: |
Options for your Grafana panels when your metrics change names
|
2024-02-28: |
Detecting absent Prometheus metrics without knowing their labels
|
2024-01-29: |
What I think goes wrong periodically with our Grafana Loki on restarts
|
2024-01-21: |
The expected size of a gap in a Prometheus range vector (sometimes)
|
2024-01-20: |
An example of how Prometheus's delta() function will extrapolate time ranges
|
2024-01-16: |
What Prometheus exporters we use (as of the end of 2023)
|
2023-12-28: |
The various phases of Prometheus Blackbox's HTTP probe
|
2023-12-17: |
Prometheus's group_left() and group_right() operators
|
2023-12-10: |
Some notes on using the logcli program to query Grafana Loki
|
2023-12-09: |
I recently used Grafana Loki for fast, flexible log searching
|
2023-11-28: |
Why we scrape Prometheus Blackbox's metrics endpoint
|
2023-10-27: |
Alerting on sticky configuration reload failures for Prometheus
|
2023-08-03: |
Prometheus scrape failures can cause alerts to be 'resolved'
|
2023-08-02: |
The Prometheus host agent's metrics for systemd unit restarts
|
2023-07-29: |
Prometheus Blackbox probes and DNS lookups
|
2023-06-04: |
Why Prometheus exporters really need fixed TCP ports
|
2023-05-27: |
How I set up a server for testing new Grafana versions and other things
|
2023-05-26: |
In practice, Grafana has not been great at backward compatibility
|
2023-04-27: |
I can't recommend serious use of an all-in-one local Grafana Loki setup
|
2023-04-08: |
A Prometheus Alertmanager alert grouping conundrum
|
2023-03-19: |
Easily adjusting the minimum interval on panels in Grafana dashboards
|
2023-03-13: |
What I like using Grafana Loki for (and where I avoid it)
|
2023-02-21: |
Grafana Loki doesn't compact log chunks and what this means for you
|
2023-02-07: |
What I want in Prometheus (as a whole) is aggregating alert notifications
|
2023-02-05: |
Some things on Prometheus's new feature to keep alerts firing for a while
|
2023-02-02: |
A gotcha when making partial copies of Prometheus's database with rsync
|
2023-01-21: |
How Prometheus makes good use of the HTTP Accept: header
|
2023-01-03: |
Some thoughts on Prometheus Alertmanager's alert reminders
|
2022-12-21: |
The Prometheus cardinality issues with systemd unit-related metrics
|
2022-12-14: |
Your options for displaying status over time in Grafana 9
|
2022-12-11: |
Prometheus Blackbox 0.23.0 has added a nice improvement to its DNS checks
|
2022-10-21: |
The Prometheus timestamp() function can be used on expressions, sort of
|
2022-10-11: |
When Promtail seems to make position checkpoints (as of v2.6.1)
|
2022-09-30: |
Your Grafana Loki setup needs security and access control
|
2022-09-14: |
Grafana Loki doesn't duplicate a central syslog server (or vice versa)
|
2022-09-12: |
What's lost when running the Prometheus host agent as a non-root user on Linux
|
2022-09-08: |
Grafana's problem with the order of dashboard panel legends and Prometheus
|
2022-09-06: |
Machine room temperatures and the value of long Prometheus metrics history
|
2022-09-03: |
Our Prometheus host metrics saved us from some painful experiences
|
2022-09-02: |
An rsyslog(d) syslog forwarding setup for Grafana Loki (via Promtail)
|
2022-08-28: |
Getting USB TEMPer2 temperature sensor readings into Prometheus (on Linux)
|
2022-08-22: |
Some notes on Grafana annotations sourced from Prometheus metrics
|
2022-08-19: |
I wish Prometheus had a table-driven label remapping feature
|
2022-08-12: |
My adventure with URLs in a Grafana that's behind a reverse proxy
|
2022-08-08: |
Two example Grafana Loki log queries to get things from ntpdate logs
|
2022-07-31: |
Using Prometheus's recent '@ end()' PromQL feature to reduce graph noise
|
2022-07-26: |
To be fully useful, Prometheus histograms want their cumulative sums
|
2022-07-21: |
You can sensibly move or copy Prometheus's database with rsync
|
2022-07-19: |
We won't be sending systemd logs to Grafana Loki in JSON format
|
2022-07-18: |
Grafana Loki and what can go wrong with label cardinality
|
2022-06-16: |
I wish Grafana dashboards and panels could have easy, natural comments
|
2022-06-06: |
Doing a selective alert about a host's additional exporters in Prometheus
|
2022-06-05: |
Checking a few metrics (time series) at once in Prometheus's query language
|
2022-05-28: |
It's a bit risky to give people access to your Prometheus Blackbox exporter
|
2022-05-06: |
Filtering Prometheus metrics with deliberately repeated labels
|
2022-03-20: |
Prometheus: using gauge-like things as if they were counters
|
2022-03-09: |
Linux disk names you can encounter in your Prometheus host metrics
|
2022-03-07: |
The convenience of multi-purpose monitoring (in Prometheus)
|
2022-02-04: |
Some notes on Grafana relative time ranges
|
2022-01-15: |
You should do lint checks on your Prometheus alert (and recording) rules
|
2022-01-14: |
Link: Histograms in Grafana (a howto)
|
2022-01-11: |
Some things about Prometheus Alertmanager's notification metrics
|
2022-01-10: |
The complexity of seeing if your Prometheus Alertmanager is truly healthy
|
2021-11-30: |
Prometheus will make persistent connections to agents (scrape targets)
|
2021-09-30: |
Moving averages (and rates) for metrics in Prometheus (and Grafana)
|
2021-09-11: |
How many Prometheus metrics a typical host here generates
|
2021-09-04: |
Adding a "host" label to all of your per-host Prometheus metrics
|
2021-09-03: |
How I try out and test new versions of Grafana
|
2021-08-13: |
Prometheus alerts and the idea of "deadbands" (or maybe hysteresis) (with an implementation)
|
2021-08-12: |
Prometheus, Alertmanager, and maybe avoiding flapping alerts
|
2021-08-04: |
When you do and don't get stuck query results from a down Prometheus
|
2021-06-29: |
Monitoring the status of Linux network interfaces with Prometheus
|
2021-06-17: |
In Prometheus queries, on and ignoring don't drop labels from the result
|
2021-05-27: |
Some thoughts on having set up a personal Alertmanager instance
|
2021-05-22: |
I don't know how much memory our Prometheus setup needs
|
2021-05-15: |
The size of our Prometheus setup as of May 2021
|
2021-04-12: |
Counting how many times something started or stopped failing in Prometheus
|
2021-04-04: |
Some uses for Prometheus's resets() function
|
2021-03-31: |
Understanding Prometheus' changes() function and what it can do for me
|
2021-03-14: |
I wish Prometheus had some features to deal with 'missing' metrics
|
2021-03-13: |
Prometheus and the case of the stuck metrics
|
2021-02-24: |
How convenience in Prometheus labels for alerts led me into a quiet mistake
How (and where) Prometheus alerts get their labels
|
2021-02-22: |
How I set up testing alerts in our Prometheus environment
|
2021-01-10: |
What timestamps you get back along with Prometheus query results
|
2021-01-08: |
How to extract raw time series data from Prometheus
|
2020-12-30: |
Some ways to do a Prometheus query as of a given time
A Prometheus wish: easy ways to evaluate a PromQL query at a given time
|
2020-12-15: |
How to make Grafana properly display a Unix timestamp
In Prometheus, it's hard to work with when metric points happened
|
2020-12-02: |
Prometheus 2.23.0 now lets you display graphs in local time
|
2020-11-17: |
Grafana and the case of the infinite serial number
|
2020-10-31: |
A gotcha with combining single-label and multi-label Prometheus metrics
|
2020-10-17: |
A potential Prometheus issue for labeled metrics for infrequent events
|
2020-08-18: |
The Prometheus host agent can disturb Linux CPU frequency measurements
|
2020-08-07: |
How we choose our time intervals in our Grafana dashboards
|
2020-07-14: |
Link: The Anatomy of a PromQL Query
|
2020-06-29: |
How Prometheus Blackbox's TLS certificate metrics would have reacted to AddTrust's root expiry
|
2020-06-25: |
What Prometheus Blackbox's TLS certificate expiry metrics are checking
|
2020-06-05: |
Why we put alert start and end times in our Prometheus alert messages
|
2020-06-04: |
Formatting alert start and end times in Prometheus Alertmanager messages
|
2020-05-22: |
Working out how frequently your ICMP pings fail in Prometheus
|
2020-03-30: |
Notes on Grafana 'value groups' for dashboard variables
|
2020-03-28: |
The Prometheus host agent's CPU utilization metrics can be a bit weird
|
2020-03-19: |
Make sure to keep useful labels in your Prometheus alert rules
|
2020-02-29: |
OpenBSD versus Prometheus (and Go)
|
2020-02-27: |
Some alert inhibition rules we use in Prometheus Alertmanager
|
2020-02-26: |
The magic settings to make a bar graph in Grafana
|
2020-01-26: |
How big our Prometheus setup is (as of January 2020)
|
2020-01-05: |
Why I prefer the script exporter for exposing script metrics to Prometheus
|
2020-01-04: |
Three ways to expose script-created metrics in Prometheus
|
2019-12-30: |
The history and background of us using Prometheus
|
2019-12-29: |
Prometheus and Grafana after a year (more or less)
|
2019-12-28: |
Our setup of Prometheus and Grafana (as of the end of 2019)
|
2019-12-02: |
You can have Grafana tables with multiple values for a single metric (with Prometheus)
Calculating usage over time in Prometheus (and Grafana)
|
2019-11-30: |
Counting the number of distinct labels in a Prometheus metric
|
2019-11-25: |
In Prometheus, don't be afraid of high cardinality metrics if they're valuable enough
|
2019-10-08: |
How we implement reboot notifications when our machines reboot in Prometheus
|
2019-09-17: |
Finding metrics that are missing labels in Prometheus (for alert metrics)
|
2019-09-02: |
Another way to do easy configuration for lots of Prometheus Blackbox checks
|
2019-08-26: |
A lesson of (alert) scale we learned from a power failure
|
2019-07-28: |
A note on using the Go Prometheus client package to exposed labeled metrics
|
2019-06-28: |
Using Prometheus's statsd exporter to let scripts make metrics updates
|
2019-06-02: |
Exploring the start time of Prometheus alerts via ALERTS_FOR_STATE
|
2019-05-20: |
Understanding how to pull in labels from other metrics in Prometheus
|
2019-05-03: |
Some implications of using offset instead of delta() in Prometheus
|
2019-04-28: |
A gotcha with stale metrics and *_over_time() in Prometheus
|
2019-04-26: |
Brief notes on making Prometheus instant queries with curl
|
2019-04-21: |
My view on upgrading Prometheus (and Grafana) on an ongoing basis
|
2019-04-18: |
A pattern for dealing with missing metrics in Prometheus in simple cases
|
2019-04-13: |
Remembering that Prometheus expressions act as filters
|
2019-03-24: |
Prometheus's delta() function can be inferior to subtraction with offset
|
2019-03-18: |
Prometheus subqueries pick time points in a surprising way
|
2019-03-12: |
An easy optimization for restricted multi-metric queries in Prometheus
|
2019-03-11: |
Testing Prometheus alert conditions through subqueries
|
2019-03-10: |
What the default query step is for Prometheus subqueries
|
2019-03-06: |
Using Prometheus subqueries to look for spikes in rates
|
2019-02-27: |
Using Prometheus subqueries to do calculations over time ranges
|
2019-02-17: |
Some notes on heatmaps and histograms in Prometheus and Grafana
|
2019-01-23: |
A little surprise with Prometheus scrape intervals, timeouts, and alerts
|
2018-12-14: |
Why our Grafana URLs always require HTTP Basic Authentication
|
2018-12-12: |
One situation where you absolutely can't use irate() in Prometheus
|
2018-12-03: |
Linux disk IO stats in Prometheus
|
2018-11-25: |
How we monitor our Prometheus setup itself
|
2018-11-20: |
When Prometheus Alertmanager will tell you about resolved alerts
|
2018-11-11: |
Easy configuration for lots of Prometheus Blackbox checks
|
2018-11-10: |
Why Prometheus turns out not be our ideal alerting system
|
2018-11-09: |
Getting CPU utilization breakdowns efficiently in Prometheus
|
2018-11-05: |
rate() versus irate() in Prometheus (and Grafana)
|
2018-10-28: |
How I'm visualizing health check history in Grafana
|
2018-10-22: |
Using group_* vector matching in Prometheus for database lookups
|
2018-10-18: |
Some things on delays and timings for Prometheus alerts
|
2018-10-17: |
When metrics disappear on updates with Prometheus Pushgateway
|
2018-10-13: |
Getting a CPU utilization breakdown in Prometheus's query language, PromQL
How Prometheus's query steps (aka query resolution) work
|
2018-10-11: |
Some notes on Prometheus's Blackbox exporter
|