Wandering Thoughts archives

2020-11-30

Our monitoring of our OpenBSD machines, such as it is (as of November 2020)

We have a number of OpenBSD firewalls in service (along with some other OpenBSD servers for things like VPN endpoints), and I was recently asked how we monitor PF and overall network traffic on them. I had to disappoint the person who asked with my answer, because right now we mostly don't (although this is starting to change).

Due to past problems, we've long run scripts (from cron) that look for the PF state tables getting close to full and send us email if that's happening. These scripts started out simple (just a 'there are X states' email) but have grown more elaborate over time; the current version sends us information on what look like the top traffic sources and saves a complete 'pfctl -ss' dump for us to look at later. These scripts predate our Prometheus system and so aren't hooked into it.

More recently, we've written some scripts to generate Prometheus metrics for things like VPN usage information. These work by parsing the output of standard OpenBSD tools like npppctl. As an extension of this work, we've also wound up writing a program to parse 'pfctl -ss' output to track more details about PF state table usage and publish them as Prometheus metrics. This gives us a better picture about what PF state table levels are normal, whether a problem showed up all of a sudden or slowly ramped up, and so on. It's also uncovered some odd behavior by various hosts that didn't rise to the level of filling up our state tables and provoking email from our monitoring script.

Recent versions of OpenBSD (from 6.6 onward) have a reasonably current version of the Prometheus host agent available through the OpenBSD packages collection, and we've installed it on some of our OpenBSD machines that are recent enough. This seems to work okay, although you don't get as many host metrics as on Linux and Prometheus hasn't caught up to the latest OpenBSD changes yet. You do get network usage metrics, which is useful for firewalls and VPN servers, and the CPU state metrics revealed that our 6.6 SMP L2TP VPN servers spend a lot of time in kernel spinlocks.

(OpenBSD 6.6 has or had version 0.18.0, 6.7 has 0.18.1, and 6.8 has 1.0.1, the current version of the host agent as I write this. It's possible that the 6.7 or 6.8 versions have been patched to support the new 'spin' CPU state, but I suspect not.)

For network volume and traffic monitoring, my strong impression is that what you usually want is something that supports sFlow. We haven't investigated this for our OpenBSD machines or attempted to gather any sort of metrics similar to this (although the Prometheus host agent will give you per interface information). One reason for our relatively low interest on our firewalls is that many of the interesting flows are inside a single internal network as they slosh around our switch infrastructure.

In the long run I think we're likely to run the Prometheus host agent on all of our OpenBSD machines as we upgrade them to modern OpenBSD versions. The host agent provides reasonably useful information and since it's available packaged, it's easy to install. We'll probably expand our PF scraping to cover more firewalls, since that's also easy (although I'll have to make it deal with NAT for our perimeter firewall). Unless a real need arises I don't see us adding more extensive PF monitoring and network volume tracking.

(But now that I do some Internet research, I see there's pflow(4), so maybe there are some easy to deploy tools out there. But it's not a priority for us and it would add more complications to our OpenBSD machines.)

OurOpenBSDMonitoring written at 22:22:23; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.