The "why" problem with on-host (host-based) firewalls on your machines

October 4, 2021

I somewhat recently read j. b. crawford's host firewalls, which as I read it puts forward a core thesis:

The great thing about a host firewall, the thing that really makes it a powerful tool that can do things that your Third-Generation Smart Firewall in the network rack can't, is something of a secret weapon: a host firewall can make decisions based on not just the packet but the process that sent or will receive it.

In the old days, this was to spot and deal with malware, but today, in theory, we could use this to deal with all of the things that want to phone home to snoop on us. Unfortunately, I believe there is a problem with this nice vision, what I will call the problem of "why".

If we're asked to decide if a program should be allowed to make a network connection, often one of the things we care about is why this connection is being done, not just what is trying to connect to where. Sometimes we don't need to know why, because what and where is sufficiently good or bad that it's clear (if your Twitter client is trying to connect to api.twitter.com, or some random program is trying to connect to 'sketchy-malware.com'), but in many cases it's a lot less clear. Is your video conferencing client making a call to Facebook because it's sending telemetry, or is it some side effect of their 'log in with Facebook' option?

(And this is before you start looking at how many connections are actually being made to opaque hostnames on CDNs. I tcpdump my outgoing network traffic every so often and it can be startling. There's also looking at about:networking in Firefox, even after you're using an adblocker.)

You could introduce host APIs that ask programs to declare the purpose of their connections and HTTP requests and so on, but you can cynically guess what would likely happen next. Some programs and code would be honest, but malware and various dubious programs and code would lie outright or at least bend the truth a lot. The information wouldn't be trustworthy enough, or at least you would be down to much like the current situation where your first decision would be how much you trust the program itself.

(There is also the related issue that programs could simply refuse to work entirely if you didn't let their telemetry phone home. But let's assume that they couldn't get away with this for one reason or another, including that they didn't want the bad publicity from failing entirely when their telemetry provider was down.)

A possible counter-argument (and a nice future world) would be that very few programs actively need to talk to many different companies as part of their normal operations. So we should expect or at least want that our video conferencing program entirely talks to the domain of its company and so on. In a world where who talks to what is more visible, in theory there could be social pressure to do this just to make your program more tractable for people to deal with. I don't think this is terribly likely, but the reasons for that need to go into another entry.

Written on 04 October 2021.
« Modern TLS has no place left for old things, especially clients
Some early notes on using pipx for managing third-party Python programs »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Oct 4 21:39:33 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.