Sample Spaces and Random Variables

Johanna Davidson's fascination with randomness dated back to her first course in probability and statistics. What she found most intriguing was the fact that the teacher could not provide a satisfactory definition of "random" (or of "probability," for that matter), even though the notions such as "random variable" and "random sample" lie at the heart of the theory.

Arturo Sangalli
Pythagoras' Revenge
Princeton University of Press, 2009, p. 69

A sample space is a collection of all possible outcomes of a random experiment. A random variable is a function defined on a sample space. We shall consider several examples shortly. Later on we shall introduce probability functions on the sample spaces. A sample space may be finite or infinite. Infinite sample spaces may be discrete or continuous.

Finite Sample Spaces

Tossing a coin. The experiment is tossing a coin (or any other object with two distinct sides.) The coin may land and stay on the edge, but this event is so enormously unlikely as to be considered impossible and be disregarded. So the coin lands on either one or the other of its two sides. One is usually called head, the other tail. These are two possible outcomes of a toss of a coin. In the case of a single toss, the sample space has two elements that interchangeably, may be denoted as, say,

{Head, Tail}, or {H, T}, or {0, 1}, ...

Rolling a die. The experiment is rolling a die. A common die is a small cube whose faces shows numbers 1, 2, 3, 4, 5, 6 one way or another. These may be the real digits or arrangements of an appropriate number of dots, e.g. like these

There are six possible outcomes and the sample space consists of six elements:

{1, 2, 3, 4, 5, 6}.

Many random variables may be associated with this experiment: the square of the outcome f(x) = x², with values from

{1, 4, 9, 16, 25, 36},

centered values from

{-2.5, -1.5, -0.5, 0.5, 1.5, 2.5},

with the variable defined by f(x) = x - 3.5, etc.

Drawing a card. The experiment is drawing a card from a standard deck of 52 cards. The cards are of two colors - black (spades and clubs) and red (diamonds and hearts), four suits (spades, clubs, diamonds, hearts), 13 values (2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King, Ace). (Some decks use 4 colors, others use different names. For example, a Jack may be called a Knave. We shall abbreviate the named designations as J, Q, K, A.) There are 52 possible outcomes with the sample space

{2♠, 2♣, 2♦, 2♥, 3♠, 3♣, 3♦, 3♥, ..., A♠, A♣, A♦, A♥}.

Of course, if we are only interested in the color of a drawn card, or its suite, or perhaps the value, then it would be as natural to consider other sample spaces:

{b, r},
{♠, ♣, ♦, ♥} or
{2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A}.

Choosing a birthday. The experiment is to select a single date during a given year. This can be done, for example, by picking a random person and inquiring for his or her birthday. Disregarding leap years for the sake of simplicity, there are 365 possible birthdays, which may be enumerated

{1, 2, 3, 4, ..., 365}.

Tossing two coins. The experiment is tossing two coins. One may toss two coins simultaneously, or one after the other. The difference is in that in the second case we can easily differentiate between the coins: one is the first, the other second. If the two indistinguishable coins are tossed simultaneously, there are just three possible outcomes, {H, H}, {H, T}, and {T, T}. If the coins are different, or if they are thrown one after the other, there are four distinct outcomes: (H, H), (H, T), (T, H), (T, T), which are often presented in a more concise form: HH, HT, TH, TT. Thus, depending on the nature of the experiment, there are 3 or 4 outcomes, with the sample spaces

Indistinguishable coins
	{{H, H}, {H, T}, {T, T}}.

Distinct coins
	{HH, HT, TH, TT}

Rolling two dice. The experiment is rolling two dice. If the dice are distinct or if they are rolled successively, there are 36 possible outcomes: 11, 12, ..., 16, 21, 22, ..., 66. If they are indistinguishable, then some outcomes, like 12 and 21, fold into one. There are 6×5/2 = 15 such pairs giving the total number of possible outcomes as 36 - 15 = 21. In the first case, the sample space is

{11, 12, ..., 16, 21, 22, ..., 66}.

When we throw two dice we are often interested not in individual numbers that show up, but in their sum. The sum of the two top numbers is an example of a random variable, say Y(ab) = a + b (where a, b range from 1 through 6), that takes values from the set {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. It is also possible to think of this set of a sample space of a random experiment. However, there is a point in working with random variables. It is often a convenience to be able to consider several random variables related to the same experiment, i.e., to the same sample space. For example, besides Y, we may be interested in the product (or some other function) of the two numbers.

Infinite Discrete Sample Spaces

First tail. The experiment is to repeatedly toss a coin until first tail shows up. Possible outcomes are sequences of H that, if finite, end with a single T, and an infinite sequence of H:

{T, HT, HHT, HHHT, ..., {HHH...}}.

As we shall see elsewhere, this is a remarkable space that contains a not impossible event whose probability is 0. One random variable is defined most naturally as the length of an outcome. It draws values from the set of whole numbers augmented by the symbol of infinity:

{1, 2, 3, 4, ..., ∞}.

Continuous Sample Spaces

Arrival time. The experimental setting is a metro (underground) station where trains pass (ideally) with equal intervals. A person enters the station. The experiment is to note the time of arrival past the departure time of the last train. If T is the interval between two consecutive trains, then the sample space for the experiment is the interval [0, T], or

[0, T] = {t: 0 ≤ y ≤ T}.

Chord length. Given a circle of radius R, the experiment is to randomly select a chord in that circle. There are many ways to accomplish such a selection. However the sample space is always the same:

{AB: A and B are points on a given circle}.

One natural random variable defined on this space is the length of the chord.

Human height. The experiment is to randomly select a human and measure his or her length. Depending of how far reaching our means of selection is it is possible to consider a sample space of about 6.6 billion humans inhabiting the planet Earth. In this case, the height of the selected person becomes a random variable. However, it is also possible to consider the sample space consisting of all possible values of height measurements of the world population. The tallest man ever measured lived in the United States and had a height of 272 cm (8'11''). The height of the shortest person is more difficult to determine. Zero is clearly the low bound, but, for a living adult, it may be safely raised to, say, 40 cm. This suggests a sample space which is a line segment [40, 272] in centimeters. While at all times the human population is discrete, we may assume that in some height range near the normal average, all possible heights are realized making a continuous classification. Still, very certainly at the top end there are gaps and unique measurements making the upper part of the range rather discrete.

(More examples of continuous sample space can be found elsewhere.)

74391574