Quite often the word experiment describes an experimental setup, while the word trial applies to actually executing the experiment and obtaining an outcome.
A formal theory of probability has been developed in the 1930s by the Russian mathematician A. N. Kolmogorov.
The starting point is the sample (or probability) space - a set of all possible outcomes. Let's call it Ω. For the set Ω, a probability is a real-valued function P defined on the subsets of Ω:
This says in particular that both Φ and Ω are events. The event Φ that never happens is impossible and has probability 0. The event Ω has probability 1 and is certain or necessary.
If Ω is a finite set then usually the notions of an impossible event and an event with probability 0 coincide, although it may not be so. If Ω is infinite then the two notions practically never coincide. A similar dichotomy exists for the notions of a certain event and that with probability 1. Examples will be given shortly.
which is a consequence of a seemingly more general rule: for any two events A and B, their union A∪B and intersection A∩B are events and
(2')
P(A∪B) = P(A) + P(B) - P(A ∩ B).
Note, however, that (2') can be derived from (2). Indeed, assuming that all the sets involved are events, events A - B and A ∩ B are disjoint as are B - A and A ∩ B. In fact, all three events A - B,B - A, and A ∩ B are disjoint and the union of the three is exactly A∪B. We have,
In general, the collection of events is assumed to be a σ-algebra, which means that the complements of events are events and so are the countable unions and intersections.
Also from (2) and (*), if B = A∪C for disjoint A and C, then
P(B) = P(A∪C) = P(A) + P(C) ≥ P(A).
In other words, if A is a subset of B, AB, then
(5)
P(B) ≥ P(A).
Probability is a monotone function - the fact that jibes with our intuition that a larger event, i.e. an event with a greater number of favorable outcomes, is more likely to occur than a smaller event.