Expectation
In a sample space of equiprobable outcomes, the probability of an event is the ratio of the number of favorable outcomes to the total size of the space. This means that the probabilities of events are defined in relation to eacher other. If there is a finite number of exhaustive and mutually exclusive events A_{k}, k = 1, 2, ..., K, with n_{k} being the number of favorable outcomes in A_{k} then
P(A_{k}) = n_{k} / N, 
where N = n_{1} + n_{2} + ... + n_{K}.
Borrowing an example from the classic text by W. Feller, in a certain population, n_{k} is the number of families with k children. N is then the total number of families and, assuming the most benign of circumstance, there are 2N adults. But how many children are there? To state the obvious, each of the n_{k} families with k kids has k kids, so the number of kids in such families is kn_{k}. The total number of the kids is the sum of such products over all the various family sizes:
T = 1·n_{1} + 2·n_{2} + ... + k·n_{k} + ... + K·n_{K}, 
where K is the number of children in the largest family. On average, every family has

where p_{k} = n_{k} / N is the probability for a family to have k children. This quantity is one of the most important in the theory of probabilities. We shall give a more general definition.
Let X be a random variable that takes values x_{k} with the probabilities p_{k}:
E(X) = Σx_{k}p_{k} 
is known as the mathematical expectation of X (and often the expected value or the mean).
As the above example of counting kids in families of various sizes shows, the mathematical expectation of an RV is in a sense an average value of that random variable. For the die rolling experiment, let Y be the RV showing the top number of a die. Then
P(Y = k) = 1/6, k = 1, 2, ..., 6. 
By definition,

which is exactly the average, i.e., the arithmetic mean, of the numbers
Let's apply the notion of mathematical expectation to the example of a novice player seeking admittance to a tennis club. To be admitted, the fellow had to beat in two successive games members G (good) and T (top) of the club. With probabilities g and t
We shall be looking for the expected number of wins. Using L for a loss and W for a win for the aspiring novice, we shall consider two sample spaces. Following Havil, the space consists of 8 possible outcomes of a sequence of three games:
LLL, LLW, LWL, LWW, WLL, WLW, WWL, WWW 
However note that in the sequences LLL, LLW, WLL, WLW the third game is superfluous as the result of the first two make it impossible for the fellow to win two successive games, whereas the third game is unnecessary in the last two sequences WWL, WWW because the two first wins already gain the fellow admittance to the club. This makes possible and reasonable to consider a smaller sample space:
LL, LWL, LWW, WL, WW 
For the sequence TGT we have the following probabilities:
Win/Loss sequence  Probability  

LLL  (1  t)(1  g)(1  t)  
LLW  (1  t)(1  g)t  
LWL  (1  t)g(1  t)  
LWW  (1  t)gt  
WLL  t(1  g)(1  t)  
WLW  t(1  g)t  
WWL  tg(1  t)  
WWW  tgt 
for the first sample space and
Win/Loss sequence  Probability  

LL  (1  t)(1  g)  
LWL  (1  t)g(1  t)  
LWW  (1  t)gt  
WL  t(1  g)  
WW  tg 
for the second. In both cases, the probabilities add up to 1, as required. Choosing the easier way out, we verify this only for the latter:
(1  t)(1  g) + (1  t)g(1  t) + (1  t)gt + t(1  g) + tg  
= (1  t)(1  g) + [(1  t)g(1  t) + (1  t)gt] + t(1  g) + tg  
= (1  t)(1  g) + (1  t)g + t(1  g) + tg  
= [(1  t)(1  g) + t(1  g)] + [(1  t)g + tg]  
= (1  g) + g  
= 1. 
Now we introduce the random variable N that denotes the number of wins for the candidate. In the first case, N may be 0, 1, 2, or 3; in the second case, the are only three possible values: 0, 1, 2. The expectations E_{1} and E_{2} are
E_{1}(N, TGT)  = 0·(1  t)(1  g)(1  t)  
+ 1·(1  t)(1  g)t  
+ 1·(1  t)g(1  t)  
+ 2·(1  t)gt  
+ 1·t(1  g)(1  t)  
+ 2·t(1  g)t  
+ 2·tg(1  t)  
+ 3·tgt  
= 2t + g 
and, correspondingly,
E_{2}(N, TGT)  = 0·(1  t)(1  g)  
+ 1·(1  t)g(1  t)  
+ 2·(1  t)gt  
+ 1·t(1  g)  
+ 2·tg  
= t + g + tg  t^{2}g. 
Similarly,
E_{1}(N, GTG) = t + 2g and E_{2}(N, GTG) = t + g + tg  tg^{2}. 
Since t < g, we see that
E_{1}(N, TGT) < E_{1}(N, GTG), 
as expected (pun intended). We also have
E_{2}(N, TGT) < E_{2}(N, GTG), 
which ameliorates the paradoxical situation that arose from the pure count of probabilities. Although, the probability of gaining the membership playing the top guy first is larger than when playing first just a good member, the expected number of the wins is greater when postponing the confrontation with the top player.
The expectation has several algebraic properties that make it a linear function:
E(X + Y) = E(X) + E(Y) and E(αX) = αE(X), 
where X and Y are RV and α is a real number. For a constant random variable C that only takes on the value c, the expectation is exactly that value:
E(Y) = E(X  E(X)) = E(X)  E(E(X)) = E(X)  E(X) = 0, 
since E(X) is a constant, a constant RV.
References
 R. B. Ash, Basic Probability Theory, Dover, 2008
 W. Feller, An Introduction to Probability Theory and Its Applications, Vol.1, John Wiley & Sons; 2nd edition (1958)
 J. Havil, Nonplussed!, Princeton University Press, 2007
 What Is Probability?
 Intuitive Probability
 Probability Problems
 Sample Spaces and Random Variables
 Probabilities
 Conditional Probability
 Dependent and Independent Events
 Algebra of Random Variables
 Expectation
 Probability Generating Functions
 Probability of Two Integers Being Coprime
 Random Walks
 Probabilistic Method
 Probability Paradoxes
 Symmetry Principle in Probability
 Nontransitive Dice
Contact Front page Contents Up
Copyright © 19962018 Alexander Bogomolny
64256428 