Wandering Thoughts archives

2018-10-02

Thinking about what we probably want for monitoring, metrics, etc

I tweeted:

The more I poke at this, the more it feels like we should almost completely detach collecting metrics from our alerting. Most of what we want to alert on aren't metrics, and most plausible ongoing metrics aren't alertable. (Machine room temperature is a rare exception.)

Partly this is because there is almost no use in alerting on high system-level metrics that we can't do anything about. Our alertable conditions are mostly things like 'host down'.

(Yes, we are an all-pets place.)

Right now, what we have for all of this is basically a big ball of mud, hosted on an Ubuntu 14.04 machine (so we have to do something about it pretty soon). Today I wound up looking at Prometheus because it was mentioned to me that they'd written code to parse Linux's /proc/self/mountstats, and I was impressed by their 'getting started' demo, and it started thoughts circulating in my head.

Prometheus is clearly a great low-effort way to pull a bunch of system level metrics out of our machines (via their node exporter). But a significant amount of what we use for alerts with our current software for is status checks such as 'is the host responding to SSH connections', and it isn't clear that status checks fit very well into a Prometheus world. I'm sure we could make things work, but perhaps a better choice is to not try to fit a square peg into a round hole.

In contemplating this, I think we have four things all smashed together currently: metrics (how fast do IMAP commands work, what network bandwidth is one of our fileservers using), monitoring (amount of disk space used on filesystems, machine room temperature), status checks (does a host respond to SSH, is our web server answering queries), and alerting, which is mostly driven by status checks but sometimes comes from things we monitor (eg, machine room temperature). Metrics are there for their history alone; we'll never alert on them, often because there's nothing we can do about them in the first place. For monitoring we want both history and alerting, at least some of the time (although who gets the alerts varies). Our status checks are almost always there to drive alerts, and at the moment we mostly don't care about their history in that we never look at it.

(It's possible that we could capture and use some additional status information to help during investigations, to see the last captured state of things before a crash, but in practice we almost never do this with our existing status information.)

In the past when I focused my attention on this area I was purely thinking about adding metrics collection along side our existing system of alerting, status checking, monitoring, and some metrics. I don't think I had considered actively yanking alerting and status checks out from the others (for various reasons), and now it at least feels more likely that we'll do something this time around.

(Four years ago I planned to use graphite and collectd for metrics, but that never went anywhere. I don't know what we'd use today and I'm wary of becoming too entranced with Prometheus after one good early experience, although I do sort of like how straightforward it is to grab stats from hosts. Nor do I know if we want to try to connect our metrics & monitoring solution with our status checks & alerting solution. It might be better to use two completely separate systems that each focus on one aspect, even if we wind up driving a few alerts from the metrics system.)

sysadmin/MetricsAndAlertsForUs written at 21:09:14; Add Comment

An irritating limitation or two of addons in Firefox Quantum

It's reasonably well known that Firefox addons in Firefox Quantum (ie, WebExtensions addons) are more limited than pre-Quantum addons were. One of these limitations is the places where addons work at all. Some addons are not deeply affected by these limitations, but ones that deeply modify Firefox's UI, such as a gestures addon or an addon that adds a Vim style interface (via) are strongly affected because the limitations restriction where they can be used and thus where the UI works as you expect. In other words, where gestures work for me.

One limitation is explained directly in Foxy Gesture's Github README, so I'll just quote it:

More importantly, the mouse gestures will not work until the document body of the website you are visiting has parsed. In other words, the DOM must be at least partially parsed but content does not have to be loaded. [...] This is an inherent limitation of web extensions at the moment, because there is no API to get mouse events from browser chrome. In practice this is rarely an issue as mouse events are typically available very quickly.

This is almost always true in practice, because Firefox Quantum loads web pages very fast. Well, it loads them very fast when their web site is responding. When their web site isn't really responding, when you're sitting there with a blank page as Firefox tries to load things and you decide that you're going to give up and close the tab, then you run into this issue. I close most tabs through a mouse gesture, or at least I would like to, but when a new tab hangs during the initial page load (or sometimes during subsequent ones), my mouse gesture doesn't work and I have to turn to Ctrl-W on the keyboard or clicking the appropriate tab control.

The other big limitation of addons is that they can't act on pages that Firefox considers sensitive pages, especially including internal chrome pages. Unfortunately it turns out that a number of pages that you wouldn't expect are considered chrome pages, and these are pages that you may use all the time. Specifically, pages in Firefox's Reader mode are all considered chrome pages and off limits to addons, as are all pages that are showing PDFs using Firefox's internal PDF viewer. The Reader mode limitation is especially irritating and makes Reader mode quite a bit less attractive to me; if you're going to break my UI and not always work, I wonder what I'm really getting out of it.

(With both Reader mode and PDFs, there's no indication in the displayed URL itself that you're in some special internal Firefox chrome page context, since they display the normal URL. This is especially striking and irritating in Reader mode, at least to me.)

Two more important cases of chrome pages are Firefox's network errors page (what you get if you leave one of those slow-loading web pages to actually time out) and about:blank, the completely blank page that shows up under some circumstances. For instance, if you open a URL in a new window or tab except that Firefox decides that the URL should be downloaded instead of shown, you're left with an about:blank page.

(A small but irritating additional case is 'view source', which is of course another internal chrome page these days.)

I'm sure that Firefox has good internal reasons for preventing addons from injecting things into these pages, but the resulting UI glitches (where gestures suddenly stop working on some page and I have to remember that oh yeah, it's a PDF or whatever) are reasonably painful. I really wish there was some way to tell Firefox that no, really, I actually do trust Foxy Gestures that much.

(The gestures that I would like to use on all pages include general window functions like 'close tab' and 'iconify'; on PDFs, I would also like things like 'increase/decrease font size'. None of these are specific to HTML content, and the window manipulation ones are basically global.)

web/FirefoxQuantumAddonLimit written at 00:32:38; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.