The "why" problem with on-host (host-based) firewalls on your machines

October 4, 2021

I somewhat recently read j. b. crawford's host firewalls, which as I read it puts forward a core thesis:

The great thing about a host firewall, the thing that really makes it a powerful tool that can do things that your Third-Generation Smart Firewall in the network rack can't, is something of a secret weapon: a host firewall can make decisions based on not just the packet but the process that sent or will receive it.

In the old days, this was to spot and deal with malware, but today, in theory, we could use this to deal with all of the things that want to phone home to snoop on us. Unfortunately, I believe there is a problem with this nice vision, what I will call the problem of "why".

If we're asked to decide if a program should be allowed to make a network connection, often one of the things we care about is why this connection is being done, not just what is trying to connect to where. Sometimes we don't need to know why, because what and where is sufficiently good or bad that it's clear (if your Twitter client is trying to connect to api.twitter.com, or some random program is trying to connect to 'sketchy-malware.com'), but in many cases it's a lot less clear. Is your video conferencing client making a call to Facebook because it's sending telemetry, or is it some side effect of their 'log in with Facebook' option?

(And this is before you start looking at how many connections are actually being made to opaque hostnames on CDNs. I tcpdump my outgoing network traffic every so often and it can be startling. There's also looking at about:networking in Firefox, even after you're using an adblocker.)

You could introduce host APIs that ask programs to declare the purpose of their connections and HTTP requests and so on, but you can cynically guess what would likely happen next. Some programs and code would be honest, but malware and various dubious programs and code would lie outright or at least bend the truth a lot. The information wouldn't be trustworthy enough, or at least you would be down to much like the current situation where your first decision would be how much you trust the program itself.

(There is also the related issue that programs could simply refuse to work entirely if you didn't let their telemetry phone home. But let's assume that they couldn't get away with this for one reason or another, including that they didn't want the bad publicity from failing entirely when their telemetry provider was down.)

A possible counter-argument (and a nice future world) would be that very few programs actively need to talk to many different companies as part of their normal operations. So we should expect or at least want that our video conferencing program entirely talks to the domain of its company and so on. In a world where who talks to what is more visible, in theory there could be social pressure to do this just to make your program more tractable for people to deal with. I don't think this is terribly likely, but the reasons for that need to go into another entry.


Comments on this page:

By Mike at 2021-12-21 22:51:53:

I think this gets flipped on its head once you get a VPN connection to some internal network (probably via some wireguard-based multi-tunnel-setup solution these days). Then question becomes more like "Why would I allow anythong other than work-apps (for example some accounting software) to use that connection?".

I.e. random TikTok app or browsers (at least non-locked-down "special" browsers that aren't allowed access to random p-rn links) should not have any access to that VPN interface, nor is anything else like that. Having a whitelist of "only this one binary has access to our VPN" seem to make sense on that machine, and nowhere else in case of modern end-to-end wg tunnels straight to (some subset of-) services.

Written on 04 October 2021.
« Modern TLS has no place left for old things, especially clients
Some early notes on using pipx for managing third-party Python programs »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Oct 4 21:39:33 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.