The importance of numerical literacy
Via Slashdot comes a report of a new biometric airport security system for detecting 'hostile intent' (their term). According to the Wall Street Journal article:
In the latest Israeli trial, the system caught 85% of the role-acting terrorists, meaning that 15% got through, and incorrectly identified 8% of innocent travelers as potential threats, according to corporate marketing materials.
The company's goal is to prove it can catch at least 90% of potential saboteurs -- a 10% false-negative rate -- while inconveniencing just 4% of innocent travelers.
Sounds good, doesn't it?
Actually, it's useless in practice. Ask yourself this question: if someone is flagged by the machine, what are the odds that they're a real terrorist? The answer turns out to be 'really, really low'; the high accuracy at catching real terrorists is utterly dwarfed by even the 4% of innocent people caught up that is their goal, unless there are a lot of terrorists flying.
In summary: any time what you're testing for is rare, anything more than a microscopic false positive rate is going to swamp you with noise. (This effect is also commonly seen in medical tests.)
This makes a nice illustration of the importance of numerical literacy. On first blush this system sounds effective; missing only 10% of the terrorists and 'inconveniencing just 4% of innocent travelers' sounds good. You have to think about the actual numbers involved in practice to see the problem, and the people putting the marketing materials together are probably hoping that you won't.
Sidebar: some actual numbers
Let's assume that 10 terrorists are flying every day in the US, and that at least a million people fly every day, again in the US. The system will flag roughly 40,000 people; of those, nine are terrorists. (And this is making a generous assumption on how many bad people are flying; the actual number is likely to be much, much lower.)
Do things get better if we have a second, completely independent test with the same false positive and false negative rates, and we only alarm on people who trigger both? Not really; 1,607 people trigger both, of which about 8 are terrorists. We've improved all the way up to a 1 in 178 chance that the person is a terrorist, and we've missed one fifth of the people we really want to catch.
(One million people a day is actually a bit low; see here. I am also using the company's goal figures, not their current results, to be as favorable to them as possible.)
Distributions: keep your hands off
Dear Linux distributions: if I want to use a freaky super-intelligent
editor I am perfectly able to fire up emacs myself, or even type '
instead of '
vi' (honest, the extra letter is not too much for me to
type). I want vi. You know, the simple and predictable thing that
system administrators like because it behaves.
As an aid to your planning, here is a minimum feature for any editor
When I paste things into an alleged
viin insert mode in a terminal window in X, you put them in exactly as they are. No more, no less.
If your 'vi' does anything else, you are not being helpful, you are causing me to violently throw your distribution at the wall because you have just screwed up what I was trying to do.
(I would prefer that you not mangle my typing either, but not mangling my copy and pastes in X is a minimum standard.)
(I do not mind whatever distributions choose to do with things called
vim' and the like. They can be as freaky super-intelligent as they
want. My objections are strictly for what happens when I innocently type