Wednesday, April 28, 2010

False positives

Consider an individual X who is tested for a disease.
Let D mean "X has the disease" and -D mean "X does not have the disease".
Let T mean "X has tested positive for the disease".
Let P(A) mean "the probability that A is true".
Let P(A|B) mean "the probability that A is true given that B is true".

We are given P(D), P(T|D), and P(T|-D).
(i.e.: the probability that X has the disease; the probability of getting a positive test if X really does have the disease; and the probability of getting a false positive test if X actually doesn't have the disease, respectively).
Note that P(-D) = 1 - P(D).

What is P(D|T)?
(i.e.: what is the probability that X has the disease given that X tested positive for the disease?)

Let u = P(D).P(T|D) be the probability that X tests positive and has the disease.
Let v = P(-D).P(T|-D) be the probability that X tests positive and yet doesn't have the disease.

These are the only ways X can get a positive test result. It follows that

Solution: P(D|T) = u/(u + v)
(i.e.: the probability that X has the disease given X tested positive is the fraction of positive results for X that are actually correct).

Practical application.

Since disease is rare, P(-D) will be much larger than P(D). Therefore if P(T|-D) (the chance of a false positive) is not very small, then v = P(-D).P(T|-D) will be large, hence P(D|T) will be relatively low (i.e., a positive test will probably not mean that X actually has the disease).

Bottom line: make sure your tests have a low false-positive rate.

No comments: