# Expectation

In a sample space of equiprobable outcomes, the probability of an event is the ratio of the number of favorable outcomes to the total size of the space. This means that the probabilities of events are defined in relation to eacher other. If there is a finite number of exhaustive and mutually exclusive events Ak, k = 1, 2, ..., K, with nk being the number of favorable outcomes in Ak then

 P(Ak) = nk / N,

where N = n1 + n2 + ... + nK.

Borrowing an example from the classic text by W. Feller, in a certain population, nk is the number of families with k children. N is then the total number of families and, assuming the most benign of circumstance, there are 2N adults. But how many children are there? To state the obvious, each of the nk families with k kids has k kids, so the number of kids in such families is knk. The total number of the kids is the sum of such products over all the various family sizes:

 T = 1·n1 + 2·n2 + ... + k·nk + ... + K·nK,

where K is the number of children in the largest family. On average, every family has E = T/N kids.

 E = T/N = 1·n1/N + 2·n2/N + ... + K·nK/N = 1·p1 + 2·p2 + ... + K·pK = Σk·pk

where pk = nk / N is the probability for a family to have k children. This quantity is one of the most important in the theory of probabilities. We shall give a more general definition.

Let X be a random variable that takes values xk with the probabilities pk: P(X = xk) = pk. The sum

 E(X) = Σxkpk

is known as the mathematical expectation of X (and often the expected value or the mean).

As the above example of counting kids in families of various sizes shows, the mathematical expectation of an RV is in a sense an average value of that random variable. For the die rolling experiment, let Y be the RV showing the top number of a die. Then

 P(Y = k) = 1/6, k = 1, 2, ..., 6.

By definition,

 E(Y) = 1·1/6 + 2·1/6 + ... + 6·1/6 = (1 + 2 + ... + 6)/6 = 21/6 = 3.5

which is exactly the average, i.e., the arithmetic mean, of the numbers 1, 2, ..., 6.

Let's apply the notion of mathematical expectation to the example of a novice player seeking admittance to a tennis club. To be admitted, the fellow had to beat in two successive games members G (good) and T (top) of the club. With probabilities g and t (t < g) of winning against G and T, the fellow had to choose between to possible orders of games: GTG or TGT. Paradoxically, the second choice appeared to be preferable gaining the fellow the membership with the probability gt(2 - t) against the smaller gt(2 - g) for the sequence GTG.

We shall be looking for the expected number of wins. Using L for a loss and W for a win for the aspiring novice, we shall consider two sample spaces. Following Havil, the space consists of 8 possible outcomes of a sequence of three games:

 LLL, LLW, LWL, LWW, WLL, WLW, WWL, WWW

However note that in the sequences LLL, LLW, WLL, WLW the third game is superfluous as the result of the first two make it impossible for the fellow to win two successive games, whereas the third game is unnecessary in the last two sequences WWL, WWW because the two first wins already gain the fellow admittance to the club. This makes possible and reasonable to consider a smaller sample space:

 LL, LWL, LWW, WL, WW

For the sequence TGT we have the following probabilities:

Win/Loss sequence Probability LLL (1 - t)(1 - g)(1 - t) LLW (1 - t)(1 - g)t LWL (1 - t)g(1 - t) LWW (1 - t)gt WLL t(1 - g)(1 - t) WLW t(1 - g)t WWL tg(1 - t) WWW tgt

for the first sample space and

Win/Loss sequence Probability LL (1 - t)(1 - g) LWL (1 - t)g(1 - t) LWW (1 - t)gt WL t(1 - g) WW tg

for the second. In both cases, the probabilities add up to 1, as required. Choosing the easier way out, we verify this only for the latter:

 (1 - t)(1 - g) + (1 - t)g(1 - t) + (1 - t)gt + t(1 - g) + tg = (1 - t)(1 - g) + [(1 - t)g(1 - t) + (1 - t)gt] + t(1 - g) + tg = (1 - t)(1 - g) + (1 - t)g + t(1 - g) + tg = [(1 - t)(1 - g) + t(1 - g)] + [(1 - t)g + tg] = (1 - g) + g = 1.

Now we introduce the random variable N that denotes the number of wins for the candidate. In the first case, N may be 0, 1, 2, or 3; in the second case, the are only three possible values: 0, 1, 2. The expectations E1 and E2 are

 E1(N, TGT) = 0·(1 - t)(1 - g)(1 - t) + 1·(1 - t)(1 - g)t + 1·(1 - t)g(1 - t) + 2·(1 - t)gt + 1·t(1 - g)(1 - t) + 2·t(1 - g)t + 2·tg(1 - t) + 3·tgt = 2t + g

and, correspondingly,

 E2(N, TGT) = 0·(1 - t)(1 - g) + 1·(1 - t)g(1 - t) + 2·(1 - t)gt + 1·t(1 - g) + 2·tg = t + g + tg - t2g.

Similarly,

 E1(N, GTG) = t + 2g and E2(N, GTG) = t + g + tg - tg2.

Since t < g, we see that

 E1(N, TGT) < E1(N, GTG),

as expected (pun intended). We also have

 E2(N, TGT) < E2(N, GTG),

which ameliorates the paradoxical situation that arose from the pure count of probabilities. Although, the probability of gaining the membership playing the top guy first is larger than when playing first just a good member, the expected number of the wins is greater when postponing the confrontation with the top player.

The expectation has several algebraic properties that make it a linear function:

 E(X + Y) = E(X) + E(Y) and E(αX) = αE(X),

where X and Y are RV and α is a real number. For a constant random variable C that only takes on the value c, the expectation is exactly that value: E(C) = c. If, for a given RV X, Y = X - E(X), then

 E(Y) = E(X - E(X)) = E(X) - E(E(X)) = E(X) - E(X) = 0,

since E(X) is a constant, a constant RV.

### References

1. R. B. Ash, Basic Probability Theory, Dover, 2008
2. W. Feller, An Introduction to Probability Theory and Its Applications, Vol.1, John Wiley & Sons; 2nd edition (1958)
3. J. Havil, Nonplussed!, Princeton University Press, 2007

• The Means
• Averages, Arithmetic and Harmonic Means
• The Size of a Class: Two Viewpoints
• Averages of divisors of a given integer
• Family Statistics: an Interactive Gadget
• Averages in a sequence
• Arithmetic and Geometric Means
• Geometric Meaning of the Geometric Mean
• A Mathematical Rabbit out of an Algebraic Hat
• AM-GM Inequality
• The Mean Property of the Mean
• Harmonic Mean in Geometry