Friday, October 14, 2005

False positives?

When thinking about the reported Arkansas Ivory-bill sightings, it's easy to be misled by "false positives".

Let's look at a hypothetical situation:
---
Let's say in 20,000 hours of searching (much of it in an area where abnormal Pileateds have been seen), searchers get 200 fleeting glimpses of birds that might be Ivory-bills. That's one tantalizing glimpse per 100 hours of search time. Of those 200 glimpses, observers report eight "robust" sightings where they are about 96% sure that they saw an Ivory-bill. Given those eight "robust" sightings, can we be reasonably certain that someone saw an actual Ivory-bill?
---

In my opinion, the answer is "no".

Here's why I say that: if you're 96% sure, that means there is a 4% "false positive" rate. After a massive search effort, if you get 200 glimpses of birds that may be Ivory-bills, that 4% "false positive" rate might lead you to expect eight "robust" sightings, even if there are no Ivory-bills in the area.

Of course, the numbers above are truly hypothetical, and no one can actually calculate that they are "96% sure" that they glimpsed an Ivory-bill. I find it useful to think along these lines, however, because it helps me understand the vast difference between 100% proof (good videos, for example) and a number of so-called "96% sure" sightings. Quality is vastly more important than quantity here.

I think these paragraphs (from this paper) also illustrate how easily we might be misled by "false positives":

----
It is useful to begin with a handful of realistic examples that indicate how simple intuition can be misleading. Of particular relevance to college students is a (true) story about a man who received a positive outcome on a first-stage test for the virus that causes AIDS. The test that was used had a 4 percent rate of false positives, and for simplicity, it is assumed that there were virtually no false negatives. The person committed suicide before follow-up examinations, presumably not realizing that the low incidence of the virus in the male population (about 1 in 250 at that time) resulted in a posterior probability of having the virus of only about 10 percent.

To explain this point in class, it can be useful to begin with a hypothetical representative group of 1000 people, and to ask how likely it is that a person with a positive test actually carries the virus, given an infection rate of 1 in 250 for the relevant population. On average, only 4 out of the 1000 actually have the disease, and the test locates all 4 of these true positives. However, among the 996 who do not have the disease, the test will falsely identify 4 percent as having it, which is about 40 men (.04x996 = 39.84). On average, the test identifies 44 of the 1000 men as carriers of the virus, 4 correctly and 40 incorrectly, which means that a positive first-stage test actually produces a less-than-ten percent chance of a true positive. This is a case in which knowing the intuition behind Bayes’ rule can save lives.
----