Probabilities

... it is not rational for us to believe that the probable is true.

Lord J. M. Keynes
A Treatise on Probability
Cosimo Classics (June 1, 2007)

According to the definition, probability is a function on the subsets of a sample space. Let's see how it could be defined on the simplest sample space of a single coin toss, {H, T}.

The two element sample space {H, T} has four subsets:

Φ = {}, {H}, {T}, {H, T} = Ω.

To be a probability, a function P defined on this four sets must be non-negative and not exceeding 1. In addition, on the two fundamental sets Φ and Ω it must take on the prescribed values:

P(Φ) = 0 and P(Ω) = 1.

The values P({H}) and P({T}) which we shall write more concisely as P(H) and P(T) must be somewhere in-between. P(H) is expected to be the probability of a coin landing heads up; P(T) should be the probability of its landing tails up. This is up to us to assign those probabilities. Intuitively those numbers should be expressing our notion of certainty with which the coin lands one way or the other. Since, for a fair coin, there is no way to prefer one side to the other, the most natural and common way is to make the two probabilities equal:

(1)

P(H) = P(T).

As in real life, the choices we make have consequences. Once we decided that the two probabilities are equal, we are no longer at liberty to choose their common value. The definitions take over and dictate the result. Indeed, the two events {H} and {T} are mutually exclusive so that a probability function should satisfy the additivity requirement:

(2)

P({H}) + P({T})	= P({H} {T})
	= P({H, T})
	= P(Ω)
	= 1.

The combination of (1) and (2) leads inevitably to the conclusion that a probability function that models a toss of a fair coin is bound to satisfy P(H) = P(T) = 1/2.

Two events that have equal probabilities are said to be equiprobable. It's a common approach, especially in the introductory probability courses, to define a probability function on a finite sample space by declaring all elementary events equiprobable and building up the function using the additivity requirement. Having a formal definition of probability function avoids the apparent circularity of the construction hinted at elsewhere.

Let's consider the experiment of rolling a die. The sample space consists of 6 possible outcomes

{1, 2, 3, 4, 5, 6}

which, with no indication that the die used is loaded, are declared to be equiprobable. From here, the additivity requirement leads necessarily to:

P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6.

Since all 6 elementary events - {1}, {2}, {3}, {4}, {5}, {6} - are mutually exclusive, we may readily apply the required additivity, for example:

P({1, 2}) = P({1}) + P({2}) = 1/6 + 1/6 = 1/3

and similarly

P({4, 5, 6}) = P({4}) + P({5}) + P({6}) = 1/6 + 1/6 + 1/6 = 1/2

Note that a 2-element event {1, 2} has the probability of 1/3 = 2·1/6, whereas a 3-element event {4, 5, 6} has the probability of 1/2 = 3·1/6.

Let X be the random variable associated with the experiment of rolling the dice. The introduction of a random variable allows for naming various sets in a convenient manner, e.g.,:

{1, 2} = {x: x < 3},

and, for the probability, P({1, 2}) = P(X < 3) = 1/3. Similarly,

P({4, 5, 6}) = P(X > 3) = 1/2.

Here are a few additional examples:

P({2, 4, 6}) = P(X is even) = 1/2,
P({1, 2, 4, 5}) = P(X is not divisible by 3) = 2/3,
P({2, 3, 5}) = P(X is prime) = 1/2.

In general, if an event A has m favorable elementary outcomes, the additivity requirement implies P(A) = m/6. In other experiments, with n possible equiprobable elementary outcomes, we would have P(A) = m/n.

For example, under normal circumstances, drawing a particular card from a deck of 52 cards is assigned a probability of 1/52. Drawing a named (A, K, Q, J) card (of which there are 4×4 = 16 cards) has a probability of 16/52. The event of drawing a black card has the probability of 26/52 = 1/2, that of drawing a hearts the probability of 13/52 = 1/4 and the probability of drawing a 10 is 4/52 = 1/13.

Later on, we shall have examples of sample spaces where considering the elementary events as equiprobable is unjustified. However, whenever this is possible, the evaluation of probabilities becomes a combinatorial problem that requires finding the total number n of possible outcomes and the number m of the outcomes favorable to the event at hand. It is then natural that properties of combinatorial counting have bearings on the assignment and evaluation of probabilities.

When tossing two distinct (say, first and second) coins there are four possible outcomes {HH, HT, TH, TT} and no reason to declare one more likely than another. Thus each event is assigned the probability of 1/4. Here are more examples

P({H popped up at least once}) = P({HH, HT, TH}) = 3/4,
P(First coin came up heads) = P({HH, HT}) = 2/4 = 1/2,
P(Two outcomes were different) = P({HT, TH}) = 2/4 = 1/2.

We consider tossing two coins as completely independent experiments, the outcome of one having no effect on the outcome of the other. It follows then from the Sequential, or Product, Rule that the size of the sample space of the two experiments is the product of the sizes of the two sample spaces and the same holds of the probabilities. For example,

P({HT}) = 1/4 = 1/2·1/2 = P({H})·P({T}).

More generally, given two sample spaces S₁ and S₂ with the number of equiprobable outcomes n₁ and n₂ and two events E₁ (on S₁) and E₂ (on S₂) with the number of favorable outcomes m₁ and m₂. Then P(E₁) = m₁/n₁ and P(E₂) = m₂/n₂. The sample space of two successive experiments has a sample space with n₁n₂ outcomes. The event E₁E₂ which occurs if E₁ took place followed by E₂ taking place consists of m₁m₂ favorable outcomes so that

P(E₁E₂) = m₁m₂/n₁n₂ = m₁/n₁ · m₂/n₂ = P(E₁)P(E₂).

The two coins may be indistinguishable and, when thrown together, may produce only three possible outcomes {{H, H}, {H, T}, {T, T}} where the set notations are used to emphasize that the order of the outcomes of the two coins is irrelevant in this case. However, assigning each of the elementary events the probability of 1/3 is probably a bad choice. A more reasonable assignment is

P({H, H}) = 1/4,
P({H, T}) = 1/2,
P({T, T}) = 1/4.

Why? This is because the results of the two experiments won't change if we imagine the two coins different, say if we think of them as being blue and red. But, for different coins, the number of elementary events is 4, with two of them - HT and TH - destined to coalesce into one - {H, T} - when we back off from our fantasy. The other two - HH and TT - will still have the probabilities of 1/4 and the remaining total of 1/2 should be given to {H, T}.

When rolling two die, the sample space consists of 36 equiprobable elementary events each with probability 1/36. The possible sums of the two die range from 2 through 12 and the number of favorable events can be observed from the table below:

Two die

Using S for the random variable equal to the sum of the two die, the additivity requirement leads to the following probabilities:

P(S = 2)  = 1/36,
P(S = 3)  = 2/36 = 1/18,
P(S = 4)  = 3/36 = 1/12,
P(S = 5)  = 4/36 = 1/9,
P(S = 6)  = 5/36,
P(S = 7)  = 6/36 = 1/6,
P(S = 8)  = 5/36,
P(S = 9)  = 4/36 = 1/9,
P(S = 10) = 3/36 = 1/12,
P(S = 11) = 2/36 = 1/18,
P(S = 12) = 1/36,

Note that the events are mutually exclusive and exhaustive: their probabilities add up to 1.

(As a curiosity, note that, say, both sums of 4 and 5 come up in two ways, viz., 4 = 1 + 3, 4 = 2 + 2, 5 = 1 + 4, and 5 = 2 + 3. However, as we just saw, P(S = 4) < P(S = 5). That this is so may be bewildering to the uninitiated. For an historic example, see the Chevalier de Méré's Problem.)

Let's return to throwing a coin. With 3 coins, the sample space consists of 8 = 2³ possible outcomes. Four 4 die the number grows to 16 = 2⁴, and so on. We obtain a curious sample space tossing the coin until the first tail comes up. The probability P(T) that it will happen on the first toss equals 1/2. The probability P(HT) that it will happen on the second toss is evaluated under the assumption that the first toss showed heads, for, otherwise, the experiment would have stopped right after the first stop. The the outcome of the first toss has no effect on the outcome of the second,

P(HT) = P(H)·P(T) = 1/2 · 1/2 = 1/4.

Continuing in this way, P(HHT) = 1/2·1/2·1/2 = 1/8 is the probability of getting the tails on the third toss; P(HHHT) = 1/16 is the probability of getting the tails on the fourth toss, and so on. The events are mutually exclusive and exhaustive:

P(T) + P(HT) + P(HHT) + ... = 1/2 + 1/4 + 1/8 + ...

= 1/2·1 / (1 - 1/2)

= 1,

as the sum of a geometric series starting at 1/2 with the factor also of 1/2.

This is a curiosity because there is one event that has been left over: this is the event in which the outcome T never occurs. An infinite number of coin tosses is called for, each with the outcome of heads: HHHH ... Although abstractedly this event is complementary to the possibility of having a tails in a finite number of steps, this event is practically impossible because it requires an infinite number of coin tosses. Deservedly it is assigned the probability of 0.

The probability that tails will show up in four tosses or less equals

P(T) + P(HT) + P(HHT) + P(HHHT) = 1/2 + 1/4 + 1/8 + 1/16

= 1/2·(1 - 1/2⁴)/ (1 - 1/2).

More generally, the probability that the tails will show up in at most n tosses equals to the sum

1/2 + 1/4 + 1/8 + ... + 1/2ⁿ = 1/2·(1 - 1/2ⁿ)/ (1 - 1/2).

The interpretation of the infinite sum 1/2 + 1/4 + 1/8 + ... is that this is the probability of the tails showing up in a finite number of steps. This probability is 1 so that one should expect to get the tails sooner or later. For this sample space, an event with probability 0 is conceivable but practically impossible. In continuous sample spaces, events with probability 0 are a regular phenomenon and far from being impossible.

74205090

P(T) + P(HT) + P(HHT) + ...	= 1/2 + 1/4 + 1/8 + ...
	= 1/2·1 / (1 - 1/2)
	= 1,

P(T) + P(HT) + P(HHT) + P(HHHT)	= 1/2 + 1/4 + 1/8 + 1/16
	= 1/2·(1 - 1/2⁴)/ (1 - 1/2).