The importance of numerical literacy

August 14, 2006

Via Slashdot comes a report of a new biometric airport security system for detecting 'hostile intent' (their term). According to the Wall Street Journal article:

In the latest Israeli trial, the system caught 85% of the role-acting terrorists, meaning that 15% got through, and incorrectly identified 8% of innocent travelers as potential threats, according to corporate marketing materials.

The company's goal is to prove it can catch at least 90% of potential saboteurs -- a 10% false-negative rate -- while inconveniencing just 4% of innocent travelers.

Sounds good, doesn't it?

Actually, it's useless in practice. Ask yourself this question: if someone is flagged by the machine, what are the odds that they're a real terrorist? The answer turns out to be 'really, really low'; the high accuracy at catching real terrorists is utterly dwarfed by even the 4% of innocent people caught up that is their goal, unless there are a lot of terrorists flying.

In summary: any time what you're testing for is rare, anything more than a microscopic false positive rate is going to swamp you with noise. (This effect is also commonly seen in medical tests.)

This makes a nice illustration of the importance of numerical literacy. On first blush this system sounds effective; missing only 10% of the terrorists and 'inconveniencing just 4% of innocent travelers' sounds good. You have to think about the actual numbers involved in practice to see the problem, and the people putting the marketing materials together are probably hoping that you won't.

Sidebar: some actual numbers

Let's assume that 10 terrorists are flying every day in the US, and that at least a million people fly every day, again in the US. The system will flag roughly 40,000 people; of those, nine are terrorists. (And this is making a generous assumption on how many bad people are flying; the actual number is likely to be much, much lower.)

Do things get better if we have a second, completely independent test with the same false positive and false negative rates, and we only alarm on people who trigger both? Not really; 1,607 people trigger both, of which about 8 are terrorists. We've improved all the way up to a 1 in 178 chance that the person is a terrorist, and we've missed one fifth of the people we really want to catch.

(One million people a day is actually a bit low; see here. I am also using the company's goal figures, not their current results, to be as favorable to them as possible.)


Comments on this page:

From 139.78.115.103 at 2006-08-15 14:42:42:

Very good points you make. A little Bayesian math is good for everyone:

P(a|b) = P(b|a)p(a)/p(b)

So, let's see. We'll call (a) 'person is a terrorist' and (b) 'person is tagged by the system.' The article says p(b|a) was 0.85. It also says that p(b) is somewhat more than 0.08. You should add in the percentage of people that were tagged AND were terrorists, but that's not given. But even if we go with just 0.08, we've learned that the probability that someone is a terrorist given that they were tagged is 0.85/0.08 * p(a), or 10.625p(a). But then we discover, as you pointed out, the horror of it all -- p(a) is hideously, hideously small. If fully 1 in 1000 passengers are terrorists (I certainly hope this isn't the case), p(a) is 0.001 and p(a|b) is now about 1%. So the chance you'll find a terrorist given that you tagged them -- the number they didn't give -- is 1%. Except it's not, of course, it's much, much lower. (Even lower yet, given that p(b) should've been more than 0.08.) The projected numbers, as you say, don't help much.

In short, I agree. The most basic of probability laws puts paid to these claims. But nobody much knows it.

-Random

Written on 14 August 2006.
« Distributions: keep your hands off vi
Hardware RAID versus software RAID »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Aug 14 22:03:13 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.