Having ClamAV reject email using the Malwarepatrol database seems unwise

September 5, 2023

In practice, ClamAV is both a virus and malware recognition engine and a collection of malware signatures. ClamAV only comes with a limited set of signatures, so supplementing it with additional third party sources is popular (and perhaps almost essential). Often people use update tools and scripts to configure and fetch these additional signatures, such as Fangfrisch. One of the popular providers of third party signatures is Malware Patrol, who have a number of tiers of access, including a (free) tier for educational institutions. Since we are an educational institution, we signed up for this tier and added it to the configuration of the third party update script we were using at the time so that it would be part of our email anti-spam filtering (when we switched over to ClamAV from our prior solution). Well, we thought we'd added it; in fact we'd made a configuration mistake such that we were silently failing to fetch the Malware Patrol database. We only noticed and fixed this mistake when we switched to Fangfrisch for our third party updates.

Soon afterward, our logs started reporting rather a lot of Malware Patrol hits and some people here started complaining that email to them was being rejected. Investigation showed that the rejections were from Malware Patrol signatures and the ones we could decode had what I would call alarmingly broad text matches that they were looking for (Malware Patrol uses ClamAV's body-based signature content format, generally with just a string it's looking for).

(One reason we couldn't decode what some Malware Patrol signatures were matching was that the Malware Patrol data is updated frequently, with signatures regularly being removed.)

Malware Patrol is fairly open and unapologetic about these broad matches in an article called Whitelisting for Block Lists. They specifically say:

Malware Patrol’s #1 goal is to protect customers from malware and ransomware infections. These days, this can mean blocking mainstream domains. Consequently, our customers report potential false positives for sites like docs(.)google(.)com, drive(.)google(.)com, dropbox(.)com and github(.)com. Systems like Google Docs serve files from their root directories. This forces some block list formats to then block the entire domain, frustrating users.

[...]

Although Malware Patrol doesn't say this explicitly, it appears that the ClamAV database format is one such format that sometimes forces them to block entire domains like 'drive.google.com' (we observed this in one signature). They suggest filtering their database before using it, but this has a number of problems; the ClamAV format is hex-encodes the ASCII bytes, for example, and on a larger scale it would mean we'd only be excluding things after people here had run into problems and reported them to us.

I don't fault Malware Patrol for their choice. The balance between false positives and false negatives is not one with a clear single answer, and Malware Patrol seems to have come down on the side of not having false negatives, even at the cost of false positives. But it does mean that Malware Patrol's objectives and ours aren't in alignment, as we care more about avoiding (too many) false positives than we do about avoiding every last false negative.

Our resolution to this was to take Malware Patrol out of our third party ClamAV data sources. I'm sure there are situations where using their database as part of ClamAV screening makes sense, but my view is that if you're rejecting email based on ClamAV signature matches, you likely can't use Malware Patrol's data. It's too dangerous unless you have a quite high tolerance for false positives. Even in a system where a Malware Patrol signature match only contributed to a message's spam score, I think you could only really add a modest increase in the odds of the message being spam.

(As far as I know, ClamAV stops looking once it's found a signature and the order it checks signature databases isn't documented. This means there's no way to tell it to check signature databases you trust more before Malware Patrol.)

PS: I don't know how common it is to use ClamAV signature matches to reject email, but it is, for example, an obvious way to configure Exim, especially since Exim's malware scanning documentation does this in its example.

Written on 05 September 2023.
« An (Open)SSH Certificate Authority is sort of unlimited and sort of not
What I understand about two-factor/multi-factor authentication (in 2023) »

Page tools: View Source.
Search:
Login: Password:

Last modified: Tue Sep 5 23:07:32 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.