The brute force cron-based way of flexibly timed repeated alerts
Suppose, not hypothetically, that you have a cron job that monitors something important. You want to be notified relatively fast if your Prometheus server is down, so you run your cron job frequently, say once every ten minutes. However, now we have the problem that cron is stateless, so if our Prometheus server goes down and our cron job starts alerting us, it will re-alert us every ten minutes. This is too much noise (at least for us).
There's a standard pattern for dealing with this in cron jobs that send alerts; once the alert happens, you create a state file somewhere and as long as your current state is the same as the state file, you don't produce any output or send out your warning or whatever. But this leads to the next problem, which is that you alert once and are then silent forever afterward, leaving it to people to remember that the problem (still) exists. It would be better to re-alert periodically, say once every hour or so. This isn't too hard to do; you can check to see if the state file is more than an hour old and just re-send the alert if it is.
(One way to do this is with '
find <file> -mmin +... -print'.
Although it may not be Unixy, I do rather wish for
newerthan utilities as a standard and widely available thing. I
know I can write them in a variety of ways, but it's not the same.)
But this isn't really what we want, because we aren't around all of the time. Re-sending the alert once an hour in the middle of the night or the middle of the weekend will just give us a big pile of junk email to go through when we get back in to the office; instead we want repeats only once every hour or two during weekdays.
When I was writing our checker script, I got to this point and started planning out how I was going to compare against the current hour and day of weeek in the script to know when I should clear out the state file and so on. Then I had a flash of the obvious and realized that I already had a perfectly good tool for flexibly specifying various times and combinations of time conditions, namely cron itself. The simple way to reset the state file and cause re-alerts at whatever flexible set of times and time patterns I want is to do it through crontab entries.
So now I have one cron entry that runs every ten minutes for the main script, and another cron entry that clears the state file (if it exists) several times a day during the weekday. If we decide we want to be re-notified once a day during the weekend, that'll be easy to add as another cron entry. As a bonus, everyone here understands cron entries, so it will be immediately obvious when things run and what they do in a way that it wouldn't be if all of this was embedded in a script.
(It's also easy for anyone to change. We don't have to reach into a script; we just change crontab lines, something we're already completely familiar with.)
As it stands this is slightly too simplistic, because it clears the
state file without caring how old it is. In theory we could generate
an alert shortly before the state file is due to cleared, clear the
state file, and then immediately re-alert. To deal with that I
decided to go the extra distance and only clear the state file if
it was at least a minimum age (using
find to see if it was old
enough, because we make do with the tools Unix gives us).
(In my actual implementation, the main script takes a special
argument that makes it just clear the state file. This way only the
script has to know where the state file is or even just what to do
to clear the 'do not re-alert' state; the crontab entry just runs
Comments on this page:Written on 05 December 2018.