Bayes' Theorem

Bayes' theorem (or Bayes' Law and sometimes Bayes' Rule) is a direct application of conditional probabilities. The probability P(A|B) of "A assuming B" is given by the formula

P(A|B) = P(A∩B) / P(B)

which for our purpose is better written as

P(A∩B) = P(A|B)·P(B).

The left hand side P(A∩B) depends on A and B in a symmetric manner and would be the same if we started with P(B|A) instead:

P(B|A)·P(A) = P(A∩B) = P(A|B)·P(B).

This is actually what Bayes' theorem is about:

(1)	P(B\|A) = P(A\|B)·P(B) / P(A).

Most often, however, the theorem appears in a somewhat different form

(1')	P(B\|A) = P(A\|B)·P(B) / (P(A\|B)P(B) + P(A\|B)P(B)),

where B is an event complementary to B: B∪B = Ω, the universal event. (Of course also B∩B = Φ, an empty event.)

This is because

A = A ∩ (B ∪ B)

= A∩B ∪ A∩B

and, since A∩B and A∩B are mutually exclusive,

P(A) = P(A∩B ∪ A∩B)

= P(A∩B) + P(A∩B)

= P(A|B)P(B) + P(A|B)P(B).

More generally, for a finite number of mutually exclusive and exhaustive events H_i (i = 1, ..., n), i.e. events that satisfy

H_k ∩ H_m = Φ, for k ≠ m and
H₁ ∪ H₂ ∪ ... ∪ H_n = Ω,

Bayes' theorem states that

P(H_k|A) = P(A|H_k) P(H_k) / ∑_i P(A|H_i) P(H_i),

where the sum is taken over all i = 1, ..., n.

We shall consider several examples.

Example 1. Monty Hall Problem. [Havil, pp. 61-63]

Let A, B, C denote the events "the car is behind door A (or #1)", "the car is behind the door B (or #2)", "the car is behine the door C (or #3)." Let also M_A denote the event of Monty opening door A, etc.

You are called on stage and point to door A, say. Then

P(M_B|A) = 1/2, because Monty has to choose between
two carless doors, B and C

P(M_B|B) = 0, because Monty never opens the door with a car behind

P(M_B|C) = 1, for the very same reason that P(M_B|B) = 0.

Since A, B, C are mutually exclusive and exhaustive,

P(M_B) = P(M_B|A)P(A) + P(M_B|B)P(B) + P(M_B|C)P(C)

= 1/2 × 1/3 + 0 × 1/3 + 1 × 1/3

= 1/2.

Now you are given a chance to switch to another door, B or C (depending on which one remains closed.) If you stick with your original selection (A),

P(A|M_B) = P(M_B|A)P(A)/P(M_B)

= 1/2 × 1/3 / 1/2

= 1/3.

However, if you switch,

P(C|M_B) = P(M_B|C)P(C)/P(M_B)

= 1 × 1/3 / 1/2

= 2/3.

You'd be remiss not to switch.

Example 2. Sick Child and Doctor. [Falk, p. 48]

A doctor is called to see a sick child. The doctor has prior information that 90% of sick children in that neighborhood have the flu, while the other 10% are sick with measles. Let F stand for an event of a child being sick with flu and M stand for an event of a child being sick with measles. Assume for simplicity that F ∪ M = Ω, i.e., that there no other maladies in that neighborhood.

A well-known symptom of measles is a rash (the event of having which we denote R). P(R|M) = .95. However, occasionally children with flu also develop rash, so that P(R|F) = 0.08.

Upon examining the child, the doctor finds a rash. What is the probability that the child has measles?

Solution

Example 3. Incidence of Breast Cancer. [von Savant, pp. 103-104]

In a study, physicians were asked what the odds of breast cancer would be in a woman who was initially thought to have a 1% risk of cancer but who ended up with a positive mammogram result (a mammogram accurately classifies about 80% of cancerous tumors and 90% of benign tumors.) 95 out of a hundred physicians estimated the probability of cancer to be about 75%. Do you agree?

Solution

References

R. B. Ash, Basic Probability Theory, Dover, 2008
R. Falk, Understanding Probability and Statistics, A K Peters, 1993
J. Havil, Impossible?, Princeton University Press, 2008
M. vos Savant, The Power of Logical Thinking, St. Martin's Press, NY 1996

Sick Child and Doctor (Solution)

	P(M\|R)	= P(R\|M) P(M) / (P(R\|M) P(M) + P(R\|F) P(F))
		= .95 × .10 / (.95 × .10 + .08 × .90)
		≈ 0.57.

Which is nowhere close to 95% of P(R|M).

Incidence of Breast Cancer (Solution)

Introduce the events:

	P	- mammogram result is positive,
	B	- tumor is benign,
	M	- tumor is malignant.

Bayes' formula in this case is

	P(M\|P)	= P(P\|M) P(M) / (P(P\|M) P(M) + P(P\|B) P(B))
		= .80 × .01 / (.80 × .01 + .10 × .99)
		≈ 0.075
		= 7.5%.

A far cry from a common estimate of 75%.

73534766

P(A)	= P(A∩B ∪ A∩B)
	= P(A∩B) + P(A∩B)
	= P(A\|B)P(B) + P(A\|B)P(B).

P(M_B\|A) = 1/2,	because Monty has to choose between two carless doors, B and C
P(M_B\|B) = 0,	because Monty never opens the door with a car behind
P(M_B\|C) = 1,	for the very same reason that P(M_B\|B) = 0.

P(M_B)	= P(M_B\|A)P(A) + P(M_B\|B)P(B) + P(M_B\|C)P(C)
	= 1/2 × 1/3 + 0 × 1/3 + 1 × 1/3
	= 1/2.