2020-10-26
Sometimes alerts have inobvious reasons for existing
Somewhat recently I saw people saying negative things about common alerting practices, specifically such as generating some sort of alert when a TLS certificate was getting close to expiring. This got me to tweet something:
We don't have 'your TLS certificate is X days from expiring' alerts to tell me that we need to renew a certificate; we have them to tell us that our Let's Encrypt automation broke, and early enough that we have plenty of time to deal with the situation.
(All of our alerts go to email and are only dealt with during working hours.)
Certbot normally renews our TLS certificates when they're 30 days from expiring, and we alert if a certificate is less than 23 days from expiring. This gives Certbot a week of leeway for problems (including the machine being down for a day or three), and gives us three weeks to deal with the problem in some way, including by manually getting a certificate from another source if we have to. We also have a 'how many days to expiry' table for TLS certificates in our overall dashboard, so we can notice even before the alert if a certificate isn't getting renewed when it should be.
But none of this is visible in a simple description of what we alert on. The bare fact that we alert if a TLS certificate is less than 23 days from expiring doesn't tell you why that alert exists, and the why can have a good reason behind it (as I feel we do for this alert). As a corollary you can't tell whether an alert is sensible or not just from its description.
Another very important corollary is the same thing we saw for configuration management and procedures, which is that by themselves your alerts don't tell you why you put them into place, just what you're alerting on. Understanding why an alert exists is important, so you want to document that too, in comments or in the alert messages or both. Even if the bare alert seems to be obviously sensible, you should document why you picked that particular thing to alert on to tell you about the overall problem. It's probably useful to describe what high level problem (or problems) the alert is trying to pick up on, since that isn't necessarily obvious either.
Having this sort of 'why' documentation is especially important for alerts because alerts are notorious for drifting out of sync with reality, at which point you need to bring them back in line for them to be useful. This is effectively debugging, and now I will point everyone to Code Only Says What it Does and paraphrase a section of the lead paragraph:
Fundamentally, [updating alerts] is an exercise in changing what [an alert] does to match what it should do. It requires us to know what [an alert] should do, which isn't captured in the alert.
So, alerts have intentions, and we should make sure to document those intentions. Without the intentions, any alert can look stupid.
Link: [Firefox] Navigational Instruments
Mike Hoye's Navigational Instruments (via) is about a bunch of underknown or underappreciated features in Firefox. I learned some really useful tricks from the article, including this one:
- Holding down Alt while selecting text allows you to select text within a link without triggering the link
(In fact I used the tip to copy the title of the article from the article, because in the article the title is a link to itself.)
You may already know some of these tricks, as I did, and not care about others (I don't make much use of the URL bar, now called the 'Quantumbar'), but there's likely valuable stuff here for every Firefox user.
Fifteen years of DWiki, the Python engine of Wandering Thoughts
DWiki, the wiki engine that underlies Wandering Thoughts (this blog), is fifteen years old. That makes it my oldest Python program that's in active, regular, and even somewhat demanding use (we serve up a bunch of requests a day, although mostly from syndication feed fetchers and bots on a typical day). As is usual for my long-lived Python programs, DWiki's not in any sort of active development, as you can see in its github repo, although I did add a an important feature just last year (that's another story, though).
DWiki has undergone a long process of sporadic development, where I've added important features slowly over time (including performance improvements). This sporadic development generally means that I come back to DWiki's code each time having forgotten much of the details and have to recover them. Unfortunately this isn't as easy as I'd like and is definitely complicated by historical decisions that seemed right at the time but which have wound up creating some very tangled and unclear objects that sit at the core of various important processes.
(I try to add comments for what I've worked out when I revisit code. It's probably not always successful at helping future me on the next time through.)
DWiki itself has been extremely stable in operation and has essentially never blown up or hit an unhandled exception that wasn't caused by a very recent code change of mine. This stability is part of why I can ignore DWiki's code for long lengths of time. However, DWiki operates in an environment where DWiki processes are either transient or restarted on a regular basis; if it was a persistent daemon, more problems might have come up (or I might have been forced to pay more attention to reference leaks and similar issues).
Given that it's a Unix based project started in 2005, Python has been an excellent choice out of the options available at the time. Using Python has given me long life, great stability in the language (since I started as Python 2 was reaching stability and slowing down), good enough performance, and a degree of freedom and flexibility in coding that was probably invaluable as I was ignorantly fumbling my way through the problem space. Even today I'm not convinced that another language would make DWiki better or easier to write, and most of the other options might make it harder to operate in practice.
(To put it one way, the messy state of DWiki's code is not really because of the language it's written in.)
Several parts of Python's standard library have been very useful
in making DWiki perform better without too much work, especially
pickle
. The
various pickle modules make it essentially trivial to serialize an
object to disk and then reload it later, in another process, which
is at the core of DWiki's caching strategies. That you can pickle
arbitrary objects inside your program without having to make many
changes to them has let me easily add pickle based disk caches to
various things without too much effort.
At the same time, the very strong performance split in CPython between things implemented in C and things implemented in Python has definitely affected how DWiki is coded, not necessarily for the better. This is particularly obvious in the parsing of DWikiText, which is almost entirely done with complex regular expressions (some of them generated by code) because that's by far the fastest way to do it in CPython. The result is somewhat fragile in the face of potential changes to DWikiText and definitely hard for me to follow when I come back to it.
(With that said, I feel that parsing all wikitext dialects is a hard problem and a high performance parser is probably going to be tricky to write and follow regardless of the implementation language.)
DWiki is currently written in Python 2, but will probably eventually be ported to Python 3. I have no particular plans for when I'll try to do that for various reasons, although one of the places where I run a DWiki instance will probably drop Python 2 sooner or later and force my hand. Right now I would be happy to leave DWiki as a Python 2 program forever; Python 3 is nicer, but since I'm not changing DWiki much anyway I'll probably never use many of those nicer things in it.