Wandering Thoughts archives


My uncertainty about swapping and swap sizing for SSDs and NVMe drives

The traditional reason to avoid configuring a lot of swap space on your servers and to avoid using swap in general was that lots of swap space made it much easier for your system to thrash itself into total overload. But that's wisdom (and painful experience) from back in the days of system-wide 'global' swapping and your swap being on spinning rust (ie, hard drives). A lot of paging evicted memory back in (whether from swap or from its original files) is random IO and spinning rust had hard limits on how many IOPs a second it could do, which often had to be shared between swapping and real IO. And with global swapping, any process could be victimized by having to page things back in, or have its regular IO delayed by swapping IO. In theory, things could be different today.

Modern SSDs and especially NVMe drives are much faster and support many more IOPs a second, especially for read IO (ie, paging things back in). Paging is still quite slow if compared to simply accessing RAM, but it's not anywhere near as terrible as it used to be on spinning rust; various sources suggest that you might see page-in latencies of 100 microseconds or less on good NVMe drives, and perhaps only a few milliseconds on SSDs. Since modern SSDs and especially NVMe drives can reach and sustain very high random IO rates, this paging activity is also far less disruptive to other IO that other programs are doing.

(The figures I've seen for random access to RAM on modern machines is on the order of 100 nanoseconds. If we assume the total delay on a NVMe page-in is on the order of 100 microseconds (including kernel overheads), that means a page-in costs you around 1,000 RAM accesses. This is far better than it used to be, although it's not fast. Every additional microsecond of delay costs another 10 RAM accesses.)

Increasingly, systems also support 'local' swapping in addition to system wide 'global' swapping, where different processes or groups of processes have different RAM limits and so one group can be pushed into swapping without affecting other groups. The affected group will still pay a real performance penalty for all of the paging it's doing, but other processes should be mostly unaffected. They shouldn't have their pages evicted from RAM any faster than they otherwise would be, so if they weren't paging before they shouldn't be paging afterward. And with SSDs and NVMe drives having high concurrent IO limits, the other processes shouldn't be particularly affected by the paging IO.

If you're using SSDs or NVMe drives with enough IO capacity (and low enough latency), even system-wide swap thrashing might not be as lethal as it used to be. If everything works well with 'local' swapping, a particular group of processes could be pushed into swap thrashing by their excessive memory usage without doing anything much to the rest of the system; of course they might not perform well and perhaps you'd rather have them terminated and restarted. If all of this works, perhaps these days systems should have a decent amount of swap, much more than the minimal swap space that we have tended to configure so far.

(All of this is more true on NVMe drives than SSDs, though, and all of our servers still use SSDs for their system drives.)

However, all of this is theoretical. I don't know if it actually works in practice, especially on SSDs (where even a one millisecond delay for a page-in is the same cost as 10,000 accesses to RAM, and that's probably fast for SSDs). System wide swap thrashing on SSDs seems like a particularly bad case, and our most likely case on most servers. Per-user RAM limits seem like a better case for using a lot of swap, but even then we may not be doing people any real favours and they might be better off having the offending process just terminated.

(All of this was sparked by a Twitter thread.)

SwappingOnSSDUncertainty written at 23:40:42; Add Comment

My experience with x2go is that it's okay but not compelling

Various people's tweets and comments on earlier entries pushed me into giving x2go a bit of a try, despite having low expectations because 'seamless windows' in X are challenging for remote desktop software. The results are somewhat mixed and my view so far is that x2go isn't compelling for me. To start with, what I want from any program like this is that it work like 'ssh -X' but perform faster. I specifically don't want to run a remote desktop; I want to run X programs remotely.

On the positive side, I was genuinely surprised by how much worked. X2go properly supported multiple programs opening multiple top level X windows that work just like regular X windows, and it even arranged for their X windows on my local display to have the correct window title, X class and resource name, icon, properties, and so on. Cut and paste worked (at least in xterm). I could even suspend a session and then resume it, with all of the windows disappearing from my display and then reappearing later. In a lot of ways, the windows displayed for the remote programs acted like they were real windows created through 'ssh -X', which definitely helped the overall experience.

(My window manager cares about the X class and resource names, for example, because it treats windows from some programs specially, especially xterms. That all worked with x2go remote xterm windows.)

Performance felt somewhat better than 'ssh -X' and my monitoring suggests that x2go was using clearly less bandwidth on my DSL link. However, some of this performance was clearly achieved by skipping updates, which could leave affected things like the desktops of VMWare machines feeling jerky. More text based things like GNU Emacs and xterm felt more like I was using them at work (or locally), although generally 'ssh -X' is already pretty good for them and I don't think there was much difference.

Unfortunately, x2go doesn't propagate your local X resource database into the remote X server that all of those remote X programs are creating windows on. Nor does it have a setting to scale up windows by 2x. This meant that remote programs weren't scaled properly for my HiDPI display and came out in tiny size (this normally requires at least some X properties). I was able to fix this by manually copying my X resources over, but the need to do this (manually or with some sort of wrapper automation) makes x2go far less friendly.

As I expected, Firefox's X remote control doesn't work over x2go because there's no 'Firefox' window on x2go's hidden X server (this may also affects the XSettings system). I think that the remote X windows are also oblivious to whether or not they've been iconified on the local X server. In another glitch, my VMWare machines couldn't change the X cursor, although regular remote windows could. And I was unable to find a way to make the overall x2go client window disappear or to automatically start a session; it appears to always require some GUI interactions.

(The x2goclient program is also very chatty to standard output.)

It's clear to me that x2go is not a good replacement for 'ssh -X' for short term disposable windows; there's simply too much fiddling around by comparison. Instead I would probably need to use x2go to establish a persistent launcher program, so that I wasn't constantly fiddling around in the GUI and re-establishing X resources and so on. Once the launcher was running under x2go, I could start up additional remote X programs from it on demand with just a mouse click or whatever, which is much closer to what I'd like.

Given all of the effort required to build and use a reliable and non-annoying x2go environment, along with the limitations and glitches, I currently don't feel that I'll use x2go very much. It would be a different matter if x2go could scale up windows by 2x, because then it would be great for VMWare (where the consoles of virtual machines are unscaled and tiny, although the VMWare GUI elements can be scaled up).

(Also, I couldn't get it to work to an Ubuntu 20.04 server, only to my office Fedora 33 desktop.)

X2goMyExperienceItsOk written at 00:47:47; Add Comment


I wish Prometheus had some features to deal with 'missing' metrics

Prometheus has a reasonable number of features to let you determine things about changes in ongoing metrics. For example, if you want to know how many separate times your Blackbox ICMP pings have started to fail over a time range (as opposed to how frequently they failed), a starting point would be:

changes( probe_success{ probe="icmp" } [1d] )

(The changes() function is not ideal for this; what you would really like is changes_down and changes_up functions.)

But this and similar things only work for metrics (more exactly, time series) that are always present and only have their values change. Many metrics come and go, and right now in Prometheus you can't do changes-like things with them as a result. You can probably get averages over time, but it's at least pretty difficult to get something as simple as a count of how many times an alert fired within a given time interval. As with timestamps for samples, the information necessary is in Prometheus' underlying time series database, but it's not exposed to us.

One starting point would be to expose information that Prometheus already has about time series going stale. As covered in the official documentation on staleness, Prometheus detects most cases of metrics disappearing and puts an explicit marker in the TSDB (although this doesn't handle all cases). But then it doesn't do anything with this marker except not answer queries. Perhaps it would be possible within the existing interfaces to the TSDB to add a count_stale() function that would return a count of how many times a time series for a metric had gone stale within the range.

The flipside is counting or detecting when time series appear. I think this is harder in the current TSDB model, because I don't think there's an explicit marker when a previously not-there time series appears. This means that to know if a time series was new at time X, Prometheus would have to look back up to five minutes (by default) to check for staleness markers and to see if the time series was there. This is possible but would involve more work.

However, I think it's worth finding a solution. It feels frankly embarrassing that Prometheus currently cannot answer basic questions like 'how many times did this alert fire over this time interval'.

(Possibly you can use very clever Prometheus queries with subqueries to get an answer. Subqueries allow you to do a lot of brute force things if you try hard enough, so I can imagine detecting some indirect sign of a just appeared ALERT metric with a subquery.)

PrometheusMissingMetricsWish written at 00:48:51; Add Comment


Prometheus and the case of the stuck metrics

My home desktop can go down, crash, or lock up every so often (for example when it gets too cold). I run Prometheus on it for various reasons, and when this happens I not infrequently wind up looking at graphs of various things (either in Prometheus or in Grafana). Much of the times, these graphs have a weird structure around the time of the crash. The various metrics will be wiggling back and forth as usual before the crash, but then they go flat and just run on in straight lines at some level before they disappear entirely. It took me a while to work out what was going on.

These flat results happen because Prometheus will look backward a certain amount of time in order to find the most recent sample in a time series, by default five minutes. When my machine goes down, no new samples are being written in any time series, so the last pre-crash sample is returned as the 'current' sample for the next five minutes or so, resulting in flat lines (or rate-based things going to zero). Essentially the time series has become stuck at its last recorded value.

If you've rebooted machines you're collecting metrics from or had Prometheus collectors fail, then looked at graphs of the relevant metrics, you may have noticed that you don't see this. This is because Prometheus is smart and has an explicit concept of stale entries. In particular, it will immediately mark time series as stale under the right conditions:

If a target scrape or rule evaluation no longer returns a sample for a time series that was previously present, that time series will be marked as stale. If a target is removed, its previously returned time series will be marked as stale soon afterwards.

What this means is that if a target fails to scrape, all time series from it are immediately marked as stale. If another machine goes down or a collector fails, that target scrape will fail (possibly after a bit of a timeout), and all of its time series go away on the spot. Instead of getting stuck time series in your graphs, you get an empty void.

What's special about my home machine is that I'm running Prometheus on the machine itself, and also that the machine crashed (or at least that the Prometheus process was terminated) instead of everything shutting down in an orderly way. When the machine Prometheus is running on just stops abruptly, Prometheus doesn't see any failed targets and it doesn't have a chance to do any cleanup it might normally do in an orderly shutdown. The only way for time series to disappear is through there being no samples in the past five minutes, so for the first few minutes of my home machine being down, I get stuck time series.

(It's not entirely clear to me what Prometheus does here when the main process shuts down properly. I would probably have to pull raw TSDB data with timestamps in order to be sure, and that's too much work right now.)

PrometheusStuckMetrics written at 00:52:18; Add Comment


Dot-separated DNS name components aren't even necessarily subdomains, illustrated

I recently wrote an entry about my pragmatic sysadmin view on subdomains and DNS zones. At the end of the entry I mentioned that we had a case where we had DNS name components that didn't create what I thought of as a subdomain, in the form of the hostnames we assign for the IPMIs of our servers. These names are in the form '<host>.ipmi.core.sandbox' (in one of our internal sandboxes), but I said that 'ipmi.core.sandbox' is neither a separate DNS zone nor something that I consider a subdomain.

There's only one problem with this description; it's wrong. It's been so long since I actually dealt with an IPMI hostname that I mis-remembered our naming scheme for them, which I discovered when I needed to poke at one by hand the other day. Our actual IPMI naming scheme puts the 'ipmi' bit first, giving us host names of the form 'ipmi.<host>.core.sandbox' (as before, for the IPMI for <host>; the host itself doesn't have an interface on the core.sandbox subnet).

What this naming scheme creates is middle name components that clearly don't create subdomains in any meaningful sense. If we have host1, host2, and host3 with IPMIs, we get the following IPMI names:


It's pretty obviously silly to talk about 'host1.core.sandbox' being a subdomain, much more so than 'ipmi.core.sandbox' in my first IPMI naming scheme. These names could as well be 'ipmi-<host>'; we just picked a dot instead of a dash as a separator, and dot has special meaning in host names. The 'ipmi.core.sandbox' version would at least create a namespace in core.sandbox for IPMIs, while this version has no single namespace for them, instead scattering the names all over.

(The technicality here is DNS resolver search paths. You could use 'host1.core.sandbox' as a DNS search path, although it would be silly.)

PS: Tony Finch also wrote about "What is a subdomain?" in an entry that's worth reading, especially for historical and general context.

SubdomainsAndDNSZonesII written at 22:39:58; Add Comment


My pragmatic sysadmin view on subdomains and DNS zones

Over on Twitter, Julia Evans had an interesting poll and comment:

computer language poll: is mail.google.com a subdomain of google.com? (not a trick question, no wrong answers, please don't argue about it in the replies, I'm just curious what different people think the word "subdomain" means :) )

the ambiguity here is that mail.google.com doesn't have its own NS/SOA record. An example of a subdomain that does have those things is alpha.canada.ca -- it has a different authoritative DNS server than canada.ca does.

This question is interesting to me because I had a completely different view of it than Julia Evans did. For me, NS and SOA DNS records are secondary things when thinking about subdomains, down at the level of the mechanical plumbing that you sometimes need. This may surprise people, so let me provide a quite vivid local example of why I say that.

Our network layout has a bunch of internal subnets using RFC 1918 private IP address space, probably like a lot of other places. We call these 'sandbox' networks, and generally each research group has one, plus there are various other ones for our internal use. All of these sandboxes have host names under an internal pseudo-TLD, .sandbox (yes, I know, this is not safe given the explosion in new TLDs). Each different sandbox has a subdomain in .sandbox and then its machines go in that subdomain, so we have machines with names like sadat.core.sandbox and lw-staff.printers.sandbox.

However, none of these subdomains are DNS zones, with their own SOA and NS records. Instead we bundle all of the sandboxes together into one super sized sandbox. zone that has everything. One of the reasons for this is that we do all of the DNS for all of these sandbox subdomains, so all of those hypothetical NS and SOA records would just point to ourselves (and possibly add pointless extra DNS queries to uncached lookups).

I think most system administrators would consider these sandbox subdomains to be real subdomains. They are different namespaces (including for DNS search domains), they're operated by different groups with different naming policies, we update them separately (each sandbox has its own DNS file), and so on. But at the mechanical level of DNS zones, they're not separate zones.

But this still leaves a question about mail.google.com: is it a subdomain or a host? For people outside of Google, this is where things get subjective. A (DNS) name like 'www.google.com' definitely feels like a host, partly because in practice it's unlikely that people would ever have a <something>.www.google.com. But mail.google.com could quite plausibly someday have names under it as <what>.mail.google.com, even if it doesn't today. So to me it feels more like a subdomain even if it's only being used as a host today.

(People inside Google probably have a much clearer view of what mail.google.com is, conceptually. Although even those views can drift over time. And something can be both a host and a subdomain at once.)

Because what I consider a subdomain depends on how I think about it, we have some even odder cases where we have (DNS) name components that I don't think of as subdomains, just as part of the names of a group of hosts. One example is our IPMIs for machines, which we typically call names like '<host>.ipmi.core.sandbox' (for the IPMI of <host>). In the DNS files, this is listed as '<host>.ipmi' in the core.sandbox file, and I don't think of 'ipmi.core.sandbox' as a subdomain. The DNS name could as well be '<host>-ipmi' or 'ipmi-<host>', but I happen to think that '<host>.ipmi' looks nicer.

(What is now our IPMI network is an interesting case of historical evolution, but that's a story for another entry.)

SubdomainsAndDNSZones written at 00:59:34; Add Comment


How convenience in Prometheus labels for alerts led me into a quiet mistake

In our Prometheus setup, we have a system of alerts that are in testing, not in production. As I described recently, this is implemented by attaching a special label with a special value to each alert, in our case a 'send' label with the value of 'testing'; this is set up in our Prometheus alert rules. This is perfectly sensible.

In addition to alerts that are in testing, we also have some machines that aren't in production or that I'm only monitoring on a test basis. Because these aren't production machines, I want any alerts about these machines to be 'testing' alerts, even though the alerts themselves are production alerts. When I started thinking about it, I realized that there was a convenient way to do this because alert labels are inherited from metric labels and I can attach additional labels to specific scrape targets. This means that all I need to do to make all alerts for a machine that are based on the host agent's metrics into testing alerts is the following:

- targets:
    - production:9100

- labels:
    send: testing
    - someday:9100

I can do the same for any other checks, such as Blackbox checks. This is quite convenient, which encourages me to actually set up testing monitoring for these machines instead of letting them go unmonitored. But there's a hidden downside to it.

When we promote a machine to production, obviously we have to make alerts about it be regular alerts instead of testing alerts. Mechanically this is easy to do; I move the 'someday:9100' target up to the main section of the scrape configuration, which means it no longer gets the 'send="testing"' label on its metrics. Which is exactly the problem, because in Prometheus a time series is identified by its labels (and their values). If you drop a label or change the value of one, you get a different time series. This means that the moment we promote a machine to production, it's as if we dropped the old pre-production version of it and added a completely different machine (that coincidentally has the same name, OS version, and so on).

Some PromQL expressions will allow us to awkwardly overcome this if we remember to use 'ignoring(send)' or 'without(send)' in the appropriate place. Other expressions can't be fixed up this way; anything using 'rate()' or 'delta()', for example. A 'rate()' across the transition boundary sees two partial time series, not one complete one.

What this has made me realize is that I want to think carefully before putting temporary things in Prometheus metric labels. If possible, all labels (and label values) on metrics should be durable. Whether or not a machine is an external one is a durable property, and so is fine to embed in a metric label; whether or not it's in testing is not.

Of course this is not a simple binary decision. Sometimes it may be right to effectively start metrics for a machine from scratch when it goes into production (or otherwise changes state in some significant way). Sometimes its configuration may be changed around in production, and beyond that what it's experiencing may be different enough that you want a clear break in metrics.

(And if you want to compare the metrics in testing to the metrics in production, you can always do that by hand. The data isn't gone; it's merely in a different time series, just as if you'd renamed the machine when you put it into production.)

PrometheusHostLabelMistake written at 23:01:31; Add Comment

How (and where) Prometheus alerts get their labels

In Prometheus, you can and usually do have alerting rules that evaluate expressions to create alerts. These alerts are usually passed to Alertmanager and they are visible in Prometheus itself as a couple of metrics, ALERTS and ALERTS_FOR_STATE. These metrics can be used to do things like find out the start time of alerts or just display a count of currently active alerts on your dashboard. Alerts almost always have labels (and values for those labels), which tend to be used in Alertmanager templates to provide additional information along side annotations, which are subtly but crucially different.

All of this is standard Prometheus knowledge and is well documented, but what doesn't seem to be well documented is where alert labels come from (or at least I couldn't find it said explicitly in any of the obvious spots in the documentation). Within Prometheus, the labels on an alert come from two places. First, you can explicitly add labels to the alert in the alert rule, which can be used for things like setting up testing alerts. Second, the basic labels for an alert are whatever labels come out of the alert expression. This can have some important consequences.

If your alert expression is a simple one that just involves basic metric operations, for example 'node_load1 > 10.0', then the basic labels on the alert are the same labels that the metric itself has; all of them will be passed through. However, if your alert expression narrows down or throws away some labels, then those labels will be missing from the end result. One of the ways to lose metrics in alert expressions is to use 'by (...)', because this discards all labels other than the 'by (whatever)' label or labels. You can also deliberately pull in labels from additional metrics, perhaps as a form of database lookup (and then you can use these additional labels in your Alertmanager setup).

Prometheus itself also adds an alertname label, with the name of the alert as its value. The ALERTS metric in Prometheus also has an alertstate label, but this is not passed on to the version of the alert that Alertmanager sees. Additionally, as part of sending alerts to Alertmanager, Prometheus can relabel alerts in general to do things like canonicalize some labels. This can be done either for all Alertmanager destinations or only for a particular one, if you have more than one of them set up. This only affects alerts as seen by Alertmanager; the version in the ALERTS metric is unaffected.

(This can be slightly annoying if you're building Grafana dashboards that display alert information using labels that your alert relabeling changes.)

PS: In practice, people who use Prometheus work out where alert labels come from almost immediately. It's both intuitive (alert rules use expressions, expression results have labels, and so on) and obvious once you have some actual alerts to look at. But if you're trying to decode Prometheus on your first attempt, it and the consequences aren't obvious.

PrometheusAlertsWhereLabels written at 00:19:23; Add Comment


How I set up testing alerts in our Prometheus environment

One of the things I mentioned in my entry on how our alerts are quiet most of the time is that I have some Prometheus infrastructure for 'testing' alerts. Rather than being routed to everyone (via the normal email destination), these alerts go to a special destination that only goes to interested parties (ie, me). There are a number of different ways to implement this in Prometheus, so the way I picked to do it isn't necessarily the best one (and in fact it enables a bad habit, which is for another entry).

The simplest way to implement testing alerts is to set them up purely in Alertmanager. As part of your Alertmanager routing configuration, you would have a very early rule that simply listed all of the alerts that are in testing and diverted them. This would look something like this:

- match_re:
    alertname: 'OneAlert|DubiousAlert|MaybeAlert'
  receiver: testing-email
  [any other necessary parameters]

The problem with this is that it involves more work when you set up a new testing alert. You have to set up the alert itself in your Prometheus alert rules, and then you have to remember to go off to Alertmanager and update the big list of testing alerts. If you forget or make a typo, your testing alerts go to your normal alert receivers and annoy your co-workers. I'm a lazy person, so I picked a more general approach.

My implementation is that all testing alerts have a special Prometheus label with a special value, and then the Alertmanager matches on the presence of this (Prometheus) label. In Alertmanager this looks like:

- match:
    send: testing
  receiver: testing-email

Then in each Prometheus alert rule, we explicitly add the label and the label value in each testing rule:

- alert: MaybeAlert
  expr: ....
    send: testing

(We add some other labels for each alert, to tell us things such as whether the alert is a host-specific one or some other type of alert, like a machine room being too hot.)

This enables my laziness, because I only need to edit one file to create a new testing alert instead of two of them, and there's a lower chance of typos and omissions. It also has the bonus of keeping the testing status of an alert visible in the alert rule file, at the expense of making it harder to get a list of all alerts that are in testing. For me this is probably a net win, because I look at alert rules more often than I look at our Alertmanager configuration so I have a higher chance of seeing a still-in-testing rule in passing and deciding to promote it to production. And if I'm considering promoting a testing alert to full production status, I can re-read the entire alert in one spot while I'm thinking about it.

(Noisy testing rules get removed rapidly, but quiet testing rules can just sit there with me forgetting about them.)

PrometheusTestingAlerts written at 00:09:08; Add Comment


Normal situations should not be warnings (especially not repeated ones)

Every so often (or really, too often), people with good intentions build a program that looks at some things or does some things, and they decide to have that program emit warnings or set status results if things are not quite perfect and as expected. This is a mistake, and it makes system administrators who have to deal with the program unhappy. An ordinary system configuration should not cause a program to raise warnings or error markers, even if it doesn't allow all of the things that a program is capable of doing (or that the program wants to do by default). In addition, every warning should be rate-limited in any situation that can plausibly emit them regularly.

That all sounds abstract, so let's make it concrete with some examples drawn from the very latest version (1.1.0) of the Prometheus host agent. The host agent gathers a bunch of information from your system, which is separated into a bunch of 'collectors' (one for each sort of information). Collectors may be enabled or disabled by default, and as part of the metrics that the host agent emits it can report if a particular collector said that it failed (what consitutes 'failure' is up to the collector to decide).

The host agent has collectors for a number of Linux filesystem types (such as XFS, Btrfs, and ZFS), for networking technologies such as Fibrechannel and Infiniband, and for network stack information such as IP filtering connection tracking ('conntrack'), among other collectors. All of the collectors I've named are enabled by default. Naturally, many systems do not actually have XFS, Btrfs, or ZFS filesystems, or Infiniband networking, or any 'conntrack' state. Unfortunately, of these enabled by default collectors, zfs, infiniband, fibrechannel, and conntrack all generate metrics reporting a collector failure on Linux servers that don't use those respective technologies. Without advance knowledge of the specific configuration of every server you monitor, this makes it impossible to tell the difference between a machine that doesn't have one of those things and a real collector failure on a machine that does have one and so should be successfully collecting information about them. But at least these failures only show up in the generated metrics. At least two collectors in 1.1.0 do worse by emitting actual warnings into the host agent's logs.

The first collector is for Linux's new pressure stall information. This is valuable information but of course is only supported on recent kernels, which means recent versions from Linux distributions (so, for example, both Ubuntu 18.04 and CentOS 7 use kernels without this information). However, if the host agent's 'pressure' collector can't find the /proc files it expects, it doesn't just report a collector failure, it emits an error message:

level=error ts=2021-02-08T19:42:48.048Z caller=collector.go:161 msg="collector failed" name=pressure duration_seconds=0.073142059 err="failed to retrieve pressure stats: psi_stats: unavailable for cpu"

At least you can disable this collector on older kernels, and automate that with a cover script that checks for /proc/pressure and disables the pressure collector if it's not there.

The second collector is for ZFS metrics. In addition to a large amount of regular ZFS statistics, recent versions of ZFS on Linux expose kernel information about the overall health of each ZFS pool on the system. This was introduced in ZFS on Linux version 0.8.0, which is more recent that the version of ZoL that is included in, for example, Ubuntu 18.04. Unfortunately, in version 1.1.0 the Prometheus host agent ZFS collector insists on this overall health information being present; if it isn't, the collector emits a warning:

level=warn ts=2021-02-09T01:14:09.074Z caller=zfs_linux.go:125 collector=zfs msg="Not found pool state files"

Since this is only part of the ZFS collector's activity, you can't disable just this pool state collection. Your only options are to either disable the entire collector, losing all ZFS metrics on say your Ubuntu 18.04 ZFS fileservers, or have frequent warnings flood your logs. Or you can take the third path of not using version 1.1.0 of the host agent.

(Neither the pressure collector nor the ZFS collector rate-limit these error and warning messages. Instead one such message will be emitted every time the host agent is polled, which is often as frequently as once every fifteen or even every ten seconds.)

NormalThingsNotWarnings written at 00:16:58; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.