Wandering Thoughts archives

2018-08-10

Fetching really new Fedora packages with Bodhi

Normal Fedora updates that have been fully released are available through the regular updates repository, which is (or should be) already configured into dnf on your Fedora system. More recent (and less well tested) updates are available through the updates-testing repository, which you can selectively enable in order to see if what you're looking for is there. Right now I'm interested in Rust 1.28, because it's now required to build the latest Firefox from source, so:

# dnf --enablerepo=updates-testing check-update 'rust*'
Last metadata expiration check: 0:00:56 ago on Fri 10 Aug 2018 02:12:32 PM EDT.
#

However sometimes, as in this case and past ones, any update that actually exists is too new to even have made it into the updates-testing DNF repo. Fedora does their packaging stuff through Fedora Bodhi (see also), and as part of this packages can be built and available in Bodhi even before they're pushed to updates-testing, so if you want the very freshest bits you want to check in Bodhi.

There are two ways to check Bodhi; through the command line using the bodhi client (which comes from the bodhi-client package), or through the website. Perhaps I should use the client all the time, but I tend to reach for the website as my first check. The URL for a specific package on the website is of the form:

https://bodhi.fedoraproject.org/updates/?packages=<source package>

For example, https://bodhi.fedoraproject.org/updates/?packages=rust is the URL for Rust (and there's a RSS feed if you care a lot about a particular package). For casual use, it's probably easier to just search from Bodhi's main page.

Through the command line, checking for and downloading an update looks like this:

; bodhi updates query --packages rust --releases f28 --status pending
============================= [...]
     rust-1.28.0-2.fc28
============================= [...]
   Update ID: FEDORA-2018-42024244f2
[...]
       Notes: New versions of Rust and related tools -- see the release notes
            : for [1.28](https://blog.rust-lang.org/2018/08/02/Rust-1.28.html).
   Submitter: jistone
   Submitted: 2018-08-10 14:35:56
[...]

We insist on the pending status because that cuts the listing down and normally gives us only one package, where we get to see detailed information about it; I believe that there's normally only one package in pending status for a particular Fedora release. If there's multiple ones, you get a less helpful summary listing that will give you only the full package name instead of the update ID. If you can't get the update ID through bodhi, you can always get it through the website by clicking on the link to the specific package version on the package's page.

To fetch all of the binary RPMs for an update:

; cd /tmp/scratch
; bodhi updates download --updateid FEDORA-2018-42024244f2
[...]

Or:

; cd /tmp/scratch
; bodhi updates download --builds rust-1.28.0-2.fc28
[...]

Both versions of the bodhi command download things to the current directory, which is why I change to a scratch directory first. Then you can do 'dnf update /tmp/scratch/*.rpm'. If the resulting packages work and you feel like it, you can leave feedback on the Bodhi page for the package, which may help get it released into the updates-testing repo and then eventually the updates repo.

(In theory you can leave feedback through the bodhi command too, but it requires more setup and I think has somewhat less options than the website.)

As far as I've seen, installing RPMs this way will cause things to remember that you installed them by hand, even when they later become available through the updates-testing or the updates repo. This is probably not important to you.

(I decided I wanted an actual entry on this process that I can find easily later, instead of having to hunt around for my postscript in this entry the next time I need it.)

PS: For my future use, here is the Bodhi link for the kernel, which is probably the package I'm most likely to want to fish out of Bodhi regularly. And just in case, openssl and OpenSSH.

linux/FedoraBodhiGetPackages written at 14:58:56; Add Comment

The benefits of driving automation through cron

In light of our problem with timesyncd, we needed a different (and working) solution for time synchronization on our Ubuntu 18.04 machines. The obvious solution would have been to switch over to chrony; Ubuntu even has chrony set up so that if you run it, timesyncd is automatically blocked. I like chrony so I was tempted by this idea briefly, but then I realized that using chrony would mean having yet another daemon that we have to care about. Instead, our replacement for timesyncd is running ntpdate from cron.

There are a number of quiet virtues of driving automation out of cron entries. The whole approach is simple and brute force, but this creates a great deal of reliability. Cron basically never dies and if it were ever to die it's so central to how our systems operate that we'd probably notice fairly fast. If we're ever in any doubt, cron logs when it runs things to syslog (and thus to our central syslog server), and if jobs fail or produce output, cron has a very reliable and well tested system for reporting that to us. A simple cron entry that runs ntpdate has no ongoing state that can get messed up, so if cron is running at all, the ntpdate is running at its scheduled interval and so our clocks will stay synchronized. If something goes wrong on one run, it doesn't really matter because cron will run it again later. Network down temporarily? DNS resolution broken? NTP servers unhappy? Cure the issue and we'll automatically get time synchronization back.

A cron job is simple blunt force; it repeats its activities over and over and over again, throwing itself at the system until it batters its way through and things work. Unless you program it otherwise, it's stateless and so indifferent to what happened the last time around. There's a lot to be said for this in many system tasks, including synchronizing the clock.

(Of course this can be a drawback if you have a cron job that's failing and generating email every failure, when you'd like just one email on the first failure. Life is not perfect.)

There's always a temptation in system administration to make things complicated, to run daemons and build services and so on. But sometimes the straightforward brute force way is the best answer. We could run a NTP daemon on our Ubuntu machines, and on a few of them we probably will (such as our new fileservers), but for everything else, a cron job is the right approach. Probably it's the right approach for some of our other problems, too.

(If timesyncd worked completely reliably on Ubuntu 18.04, we would likely stick with it simply because it's less work to use the system's default setup. But since it doesn't, we need to do something.)

PS: Although we don't actively monitor cron right now, there are ways to notice if it dies. Possibly we should add some explicit monitoring for cron on all of our machines, given how central it is to things like our password propagation system. Sure, we'd notice sooner or later anyway, but noticing sooner is good.

sysadmin/CronAutomationBenefits written at 13:37:44; Add Comment

One simple general pattern for making sure things are alive

One perpetual problem in system monitoring is detecting when something goes away. Detecting the presence of something is often easy because it reports itself, but detecting absence is usually harder. For example, it generally doesn't work well to have some software system email you when it completes its once a day task, because the odds are only so-so that you'll actually notice on the day when the expected email isn't there in your mailbox.

One general pattern for dealing with this is what I'll call a staleness timer. In a staleness timer you have a timer that effectively slowly counts down; when the timer reaches 0, you get an alert. When systems report in that they're alive, this report resets their timer to its full value. You can implement this as a direct timer, or you can write a check that is 'if system last reported in more than X time ago, raise an alert' (and have this check run every so often).

(More generally, if you have an overall metrics system you can presumably write an alert for 'last metric from source <X> is more than <Y> old'.)

In a way this general pattern works because you've flipped the problem around. Instead of the default state being silence and exceptional things having to happen to generate an alert, the default state is an alert and exceptional things have to happen to temporarily suppress the alert.

There are all sorts of ways of making programs and systems report in, depending on what you have available and what you want to check. Traditional low rent approaches are touching files and sending email to special dedicated email aliases (which may write incoming email to a file, or simply run a program on incoming email that touches a relevant file). These can have the drawback that they depend on multiple different systems all working, but they often have the advantage that you have them working already (and sometimes it's a feature to verify all of the systems at once).

(If you have a real monitoring system, it hopefully already provides a full selection of ways to submit 'I am still alive' notifications to it. There probably is a very simple system that just does this based on netcat-level TCP messages or the like, too; it seems like the kind of thing sysadmins write every so often. Or perhaps we are just unusual in never having put together a modern, flexible, and readily customizable monitoring system.)

All of this is a reasonably obvious and well known thing around the general community, but for my own reasons I want to write it down explicitly.

sysadmin/SimpleAliveCheckPattern written at 00:42:26; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.