Wandering Thoughts archives


The real world is mutable (and consequences for system design)

Every so often I see people put together systems on the Internet that are designed to be immutable and permanent (most recently Go). I generally wince, and perhaps sigh to myself, because sooner or later there are going to be problems. The reality of life is that the real world is not immutable. I mean that at two levels. The first is that sometimes people make mistakes and publish things that they very strongly wish and need to change or retract. Pretending that they do not is ignoring reality. Beyond that, things in the real world are almost always mutable and removable because lawyers can show up on your doorstep with a court order to make them so, and the court generally doesn't care about what problems your choice of technology has created for you in complying. If the court says 'stop serving that', you had better do so (or have very good lawyers).

It's my view that designing systems without considering this creates two problems, one obvious and one not obvious. The obvious one is that on the day when the lawyers show up on your front door, you're going to have a problem; unless you enjoy the varied and generally unpleasant consequences of defying a court order, you're going to have to mutate your immutable thing (or perhaps shut it down entirely). If you're having to do this from a cold start, without any advance consideration of the issue, the result may be disruptive (and obviously shutting down entirely is disruptive, even if it's only temporary as you do a very hasty hack rewrite so that you can block certain things or whatever).

The subtle problem is that by creating an immutable system and then leaving it up to the courts to force you to mutate it, you've created a two-tier system. Your system actually supports deletions and perhaps modifications, but only for people who can afford expensive lawyers who can get those court orders that force you to comply. Everyone else is out of luck; for ordinary people, any mistakes they make are not fixable, unlike for the powerful.

(A related problem is that keeping your system as immutable as possible is also a privilege extended more and more to powerful operators of the service. Google can afford to pay expensive lawyers to object to proposed court orders calling for changes in their permanent proxy service; you probably can't.)

As a side note, there's also a moral dimension here, in that we know that people will make these mistakes and will do things that they shouldn't have, that they very much regret, and that sometimes expose them to serious consequences if not corrected (whether personal, professional, or for organizations). If people design a system without an escape hatch (other than what a court will force them to eventually provide), they're telling these people that their suffering is not important enough. Perhaps the designers want to say that. Perhaps they have strong enough reasons for it. But please don't pretend that there will never be bad consequences to real people from these design decisions.

PS: There's also the related but very relevant issue of abuse and malicious actors, leading to attacks such as the one that more or less took down the PGP Web of Trust. Immutability means that any such things that make it into the system past any defenses you have are a problem forever. And 'forever' can be a long time in Internet level systems.

tech/RealWorldIsMutable written at 22:06:11; Add Comment

How big our Prometheus setup is (as of January 2020)

I talked about about our setup of Prometheus and Grafana, but what I didn't discuss then is how big it is on various measures; things like how much disk space our Prometheus database takes, how many endpoints we're monitoring, how many metrics we have, how much cardinality is involved, and so on. Today I feel like running down all of those numbers for various reasons.

We started our production Prometheus setup on November 21st and it's been up since then, although the amount of metrics we've collected has varied over time (generally going up). At the moment our metrics database is using 815 GB, including 5.7 GB of WAL. Over roughly 431 days, that's averaged about 1.9 GB a day (and over the past seven days, we seem to be growing at about 1.97 GB a day, so that's more representative of our current growth rate).

At the moment we have 674 different targets that Prometheus scrapes. These range from Blackbox external probes of machines to the Prometheus host agent, so the number of metrics from each target varies considerably. Our major types of targets are Blackbox checks other than pings (260), Blackbox pings (199), and the host agent (108 hosts).

In terms of metrics, Prometheus's status information is currently reporting that we have 1,101 different metrics and 479,161 series in total. Our highest cardinality metrics are the host agent's metrics for systemd unit states (53,470 series) and a local series of metrics for Linux's NFS mountstats that condense them down to only 27,146 series (if we used the host agent's native support for this information, there would be a lot more). Our highest cardinality label is 'user', which we use both for per-user disk space usage information and VPN usage (with mostly overlapping user names). Our highest source of series is the host agent, unsurprisingly, with 449,669 of our series coming from it. The second highest is Pushgateway, which is responsible for 16,047 series. If you want to find out this detail for your own Prometheus setup, the query you want is:

sort_desc( count({__name__!=""}) by (job) )

The systemd unit state reporting generates so many series because the host agent generates a metric for every unit it reports on for every systemd state the unit can be in:

node_systemd_unit_state{ ..., name="cron.service", state="activating"}   0
node_systemd_unit_state{ ..., name="cron.service", state="active"}       1
node_systemd_unit_state{ ..., name="cron.service", state="deactivating"} 0
node_systemd_unit_state{ ..., name="cron.service", state="failed"}       0
node_systemd_unit_state{ ..., name="cron.service", state="inactive"}     0

Five series for each system unit adds up fast, even if you only have the host agent look at systemd .service units (normally it looks at more).

At the moment Prometheus appears to be adding on average about 31,000 samples a second to its database. The Prometheus process is currently reporting about 3.4 GB of resident RAM (on a 32 GB machine), although that undoubtedly fluctuates based on how many people are looking at our Grafana dashboards at any given time, as well as things like WAL compaction. It's using about 10% to 15% of a nominal single CPU (on a four-core machine with HT enabled). Outside of periodic spikes (which are probably for WAL compaction), the server as a whole runs at about 300 KB to 400 KB a second of writes; including all activity, the long term write bandwidth is about 561 KB/s. The incoming network bandwidth over the long term is about 345 KB/sec. All of this shows that we're not exactly stressing the machine.

(The machine has 32 GB of RAM not for its ordinary needs but to deal with RAM spikes due to complex ad-hoc queries. I've run the machine out of memory before when it had 16 GB. With 32 GB, we have more headroom and have been able to raise Prometheus query limits so we can support longer time ranges in our dashboards.)

sysadmin/PrometheusOurSize-2020-01 written at 01:41:47; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.