Why Prometheus exporters really need fixed TCP ports

June 4, 2023

It started with this Fediverse post from Scott Laird:

Prometheus maintains a list of port numbers for exporters, so exporters don't conflict with each other. They allocated the range 9100-9999 when the project started. 900 exporters should be enough for everyone?

Yeah, that range is now 100% allocated.

This sparked a discussion that touched on having exporters randomly pick their TCP port when they start and being set up in Prometheus through some service discovery mechanism, including low-tech ones where some external program updated files that Prometheus was reading, as if you'd manually edited them. Reading this sparked a realization for me that you almost certainly don't want to do this.

One of the default labels that Prometheus adds to all metrics scraped from exporters is the 'instance' label (see Jobs and Instances), which tells you what specific instance of an exporter things come from. This label normally includes the instance's port number. There are good operational reasons to have this label and to include the port number; for example, it lets you trace back odd things to their real, assured source instead of having to guess.

However, in Prometheus two metrics are only part of the same time series if they have the same label set, ie the same labels with the same values. If even one label changes its value, you have two different time series, and then various Prometheus functions and features won't work out the way you want. So when your exporter restarts, picks a new random port, and your service discovery updates Prometheus, the 'instance' label will change from (say) '1.2.3.4:5670' to '1.2.3.4:8670' and all of the metrics being pulled from that exporter are now a new set of time series that are not contiguous with the old, pre-restart time series from it. Speaking from personal experience of doing this to myself at a much smaller and less frequent scale (cf), you probably don't want that.

(You also have potential label cardinality issues, although you hopefully aren't restarting exporters all that often.)

You don't technically need your exporters to have a constant port (although that will make your life easier). But you definitely do want every instance of an exporter to pick one port when it's started up for the first time and then stick with it for the lifetime of that host (or container or whatever).


Comments on this page:

By Miksa at 2023-06-05 10:00:57:

Thanks but no thanks. We already have one software, our backup agent, that uses random ports and it's enough of a hassle. By default we have confined it to port ranges 7937-7999 and 8900-9936, and most of the time it stays out of the way. But every now and them some service on a random server decides it wants to use them too, and sooner or later the backup agent grabs the port first and then we need to figure out why the service isn't working. It always takes a while to figure out the cause.

Did you ever try monitoring Kubernetes pods with Prometheus? Kubernetes service discovery puts pod IP into `instance` label by default. When pod is restarted, Kubernetes usually gives it a new IP. This changes the `instance` label, which, in turn, creates a set of new time series for all the metrics exported by the pod.

On top of this, Prometheus is usually configured to add a dozen of various labels to every metric scraped from the pod. These labels include container name, pod name, node name, kubernetes namespace and other pod-level labels set in deployment config. The pod name is usually an unique string generated by Kubernetes per each running pod. It changes on pod restart. The pod can migrate to another node during the restart. This leads to the change for the `node` label. This worsens the situation with time series churn, especially in cases when pod restart frequently (for instance, during new deployments or when horizontal pod auto-scaling (HPA) is enabled).

This is common problem for Prometheus monitoring in Kubernetes, which leads to e.g. "high churn rate issues", when Prometheus has to store and index large number of time series over time.

This is a sad story about the current state of Kubernetes monitoring with Prometheus :(

Written on 04 June 2023.
« Unix is not POSIX
The Linux kernel will fix some peculiar argv usage in execve(2) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jun 4 21:55:50 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.