Wandering Thoughts archives

2020-12-30: Some ways to do a Prometheus query as of a given time
A Prometheus wish: easy ways to evaluate a PromQL query at a given time
2020-12-27: Our alerts are quiet most of the time (as they should be)
2020-12-15: How to make Grafana properly display a Unix timestamp
In Prometheus, it's hard to work with when metric points happened
2020-12-12: Sometimes a problem really is just a coincidence
2020-12-07: Exploring when the network is up on a machine
2020-12-02: Prometheus 2.23.0 now lets you display graphs in local time
2020-11-30: Our monitoring of our OpenBSD machines, such as it is (as of November 2020)
2020-11-23: My views on when you should use the official upstream versions of software
2020-11-22: Sometimes it's best to use the official upstream versions of software
2020-11-21: Github based projects have RSS syndication feeds for their releases
2020-11-17: Grafana and the case of the infinite serial number
2020-11-11: The problems inherent in building your own copies of software packages
2020-11-09: Seriously using virtualization clashes with our funding model
2020-11-08: Thinking about two different models of virtualization hosts
2020-10-31: A gotcha with combining single-label and multi-label Prometheus metrics
2020-10-30: A sysadmin learning experience courtesy of some UPS issues
2020-10-26: Sometimes alerts have inobvious reasons for existing
2020-10-24: Why configuration file snippets in a directory should have some extension
2020-10-23: An inconvenience of physical hardware is that it has to be delivered
2020-10-17: A potential Prometheus issue for labeled metrics for infrequent events
2020-10-13: As an outsider, I prefer issue tracking to be in its own application
2020-10-11: Our current usage and views of UPSes (late 2020 edition)
2020-10-10: Wanting to be able to monitor for electrical power quality issues
2020-10-09: Whether extra disks should be live or spare now depends on HDs versus SSDs
2020-09-29: Implementing 'and' conditions in Exim SMTP ACLs the easy way (and in Exim routers too)
2020-09-28: Making product names of what you use visible to people is generally a mistake
2020-09-27: Remote power control for your machines comes in two flavours
2020-09-26: We rebooted all of our servers remotely (more or less) and it all worked
2020-09-02: Why I want something like Procmail with a dedicated mail filtering language
2020-08-20: What you're looking for with a Grafana dashboard affects its settings
2020-08-16: "It works on my laptop" is a blame game
2020-08-07: How we choose our time intervals in our Grafana dashboards
2020-08-03: Exim's change to 'taint' some Exim variables is going to cause us pain
2020-07-31: Putting some extra 'obvious' information into our temperature alerts
2020-07-30: Putting IPMIs on a port isolated network to deal with shared network interfaces
2020-07-29: The problem of 'shared' IPMI network interfaces
2020-07-17: Not all sysadmin tools should be silent by default
2020-07-12: Running servers and Fred Brooks on transforming programs to products
2020-07-02: The work that's not being done from home is slowly accumulating for us
2020-06-29: How Prometheus Blackbox's TLS certificate metrics would have reacted to AddTrust's root expiry
2020-06-25: What Prometheus Blackbox's TLS certificate expiry metrics are checking
2020-06-20: The additional complications in DNS updates that secondary DNS servers add
2020-06-12: Dual displays contrasting with virtual screens (aka multiple desktops)
2020-06-06: Why sysadmins don't like changing things, illustrated
2020-06-05: Why we put alert start and end times in our Prometheus alert messages
2020-06-04: Formatting alert start and end times in Prometheus Alertmanager messages
2020-06-01: Watching the recent AddTrust root CA certificate expiry has been humbling
2020-05-29: What sort of SSH keys our users use or have listed in their authorized keys files
2020-05-25: My failure with Xpra (probably because what I want is almost impossible)
2020-05-22: Working out how frequently your ICMP pings fail in Prometheus
2020-05-17: Some views on having your system timezone set to UTC
2020-05-15: Why we use city names when configuring system timezones
2020-05-11: Why we have several hundred NFS filesystems in our environment
2020-05-09: How big our fileserver environment is (as of May 2020)
2020-05-03: What OSes we use here (as of May 2020)
2020-04-22: More on chown in combination with symlinks
2020-04-20: An important safety note about chown and symlinks (also chmod and chgrp)
2020-04-15: Some ways that servers make their disks not hot-swappable
We're (temporarily) moving to three way mirrored disks on our servers
2020-04-10: Why my commit messages for configuration files describe my changes
2020-03-30: It's worth documenting the obvious (before it stops being obvious)
Notes on Grafana 'value groups' for dashboard variables
2020-03-28: The Prometheus host agent's CPU utilization metrics can be a bit weird
2020-03-26: Any KVM over IP systems need to be on secure networks
2020-03-25: The problem of your (our) external mail gateway using internal DNS views
2020-03-23: Why we use 1U servers, and the two sides of them
2020-03-20: Wishing for a remote resilient server environment (now that it's too late)
2020-03-19: Make sure to keep useful labels in your Prometheus alert rules
2020-03-15: Why the choice of DNS over HTTPS server needs to be automatic (a sysadmin view)
2020-03-04: Unix's iowait% is a narrow and limited measure that can be misleading
2020-02-29: OpenBSD versus Prometheus (and Go)
2020-02-27: Some alert inhibition rules we use in Prometheus Alertmanager
2020-02-26: The magic settings to make a bar graph in Grafana
2020-02-23: Our (unusual) freedom to use alerts as notifications
2020-02-19: Load average is now generally only a secondary problem indicator
How and why we regularly capture information about running processes
2020-02-16: With sudo, complex argument validation is best in cover scripts
2020-02-08: Ways that I have lost the source code for installed programs
2020-01-26: How big our Prometheus setup is (as of January 2020)
2020-01-24: Go compared to Python for small scale system administration scripts and tools
2020-01-23: What we've written in Go at work and how it came about (as of January 2020)
2020-01-20: The value of automation having ways to shut it off (a small story)
2020-01-18: CUPS's page log, its use of SNMP, and (probably) why CUPS PPDs turn that off
2020-01-05: Why I prefer the script exporter for exposing script metrics to Prometheus
2020-01-04: Three ways to expose script-created metrics in Prometheus

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.