Phase 10: Probability Theory
Subject 10-01: Probability Foundations
Prerequisites: Phases 1-9 (through Advanced Linear Algebra) — set theory, basic combinatorics, measure-theoretic intuition
Learning Objectives
- Define sample spaces, events as subsets, and the σ-algebra structure for countable settings
- State and apply Kolmogorov's three axioms of probability and prove basic consequences (complement rule, monotonicity, bounds)
- Compute probabilities using the addition rule for disjoint events and the inclusion-exclusion principle for overlapping events
- Interpret probability as a measure on a measurable space and recognize limits of countable additivity
- Distinguish between equally-likely-outcome counting, relative frequency, and subjective interpretations of probability
Core Content
1. Sample Spaces and Events
A probability experiment is any process with an uncertain outcome. The sample space Ω is the set of all possible outcomes.
Examples: - Coin toss: Ω = {H, T} - Two coin tosses: Ω = {HH, HT, TH, TT} - Roll of a die: Ω = {1, 2, 3, 4, 5, 6} - Lifetime of a lightbulb (continuous): Ω = [0, ∞)
An event is a subset of the sample space: A ⊆ Ω. An event "occurs" if the actual outcome ω ∈ A.
Special events: - Certain event: Ω itself (always occurs) - Impossible event: ∅ (never occurs) - Elementary event: {ω} for a single outcome ω
For a finite or countably infinite Ω, we typically take the σ-algebra F to be the power set ℘(Ω) — every subset is an event.
Set operations on events: - Union A ∪ B: "A or B occurs" (or both) - Intersection A ∩ B: "A and B both occur" - Complement Aᶜ = Ω \ A: "A does not occur" - Difference A \ B = A ∩ Bᶜ: "A occurs but B does not"
De Morgan's Laws:
$(A ∪ B)ᶜ = Aᶜ ∩ Bᶜ (A ∩ B)ᶜ = Aᶜ ∪ Bᶜ $
Events A and B are mutually exclusive (disjoint) if A ∩ B = ∅ — they cannot occur simultaneously.
2. Kolmogorov's Axioms (1933)
A probability measure P is a function P: F → [0, 1] satisfying:
Axiom 1 (Non-negativity): P(A) ≥ 0 for every event A.
Axiom 2 (Normalization): P(Ω) = 1.
Axiom 3 (Countable additivity): If A₁, A₂, ... are pairwise disjoint events (Aᵢ ∩ Aⱼ = ∅ for i ≠ j), then:
$P(⋃_{i=1}^{∞} Aᵢ) = Σ_{i=1}^{∞} P(Aᵢ)
$
These three axioms are the foundation of all probability theory. Everything else is derived from them.
Immediate consequences (theorems, not axioms):
Theorem 1 (Complement rule): P(Aᶜ) = 1 − P(A)
Proof: A and Aᶜ are disjoint, and A ∪ Aᶜ = Ω. By Axiom 3 (finite additivity as a special case): P(A) + P(Aᶜ) = P(Ω) = 1. Hence P(Aᶜ) = 1 − P(A). ∎
Theorem 2 (Probability of impossible event): P(∅) = 0
Proof: ∅ = Ωᶜ, so P(∅) = 1 − P(Ω) = 1 − 1 = 0. ∎
Theorem 3 (Monotonicity): If A ⊆ B, then P(A) ≤ P(B)
Proof: B = A ∪ (B \ A), and A ∩ (B \ A) = ∅. So P(B) = P(A) + P(B \ A) ≥ P(A) since P(B \ A) ≥ 0. ∎
Theorem 4 (Bounds): 0 ≤ P(A) ≤ 1 for all A
Proof: ∅ ⊆ A ⊆ Ω, so by monotonicity, 0 = P(∅) ≤ P(A) ≤ P(Ω) = 1. ∎
3. Addition Rule and Inclusion-Exclusion
For disjoint events A and B:
P(A ∪ B) = P(A) + P(B) (finite additivity from Axiom 3)
For arbitrary events (not necessarily disjoint), we must avoid double-counting A ∩ B:
Addition rule (two events):
$P(A ∪ B) = P(A) + P(B) − P(A ∩ B) $
Derivation: Write A ∪ B as the disjoint union (A \ B) ∪ (B \ A) ∪ (A ∩ B). Then:
$P(A ∪ B) = P(A \ B) + P(B \ A) + P(A ∩ B)
= [P(A) − P(A ∩ B)] + [P(B) − P(A ∩ B)] + P(A ∩ B)
= P(A) + P(B) − P(A ∩ B)
$
Inclusion-exclusion (three events):
$P(A ∪ B ∪ C) = P(A) + P(B) + P(C)
− P(A ∩ B) − P(A ∩ C) − P(B ∩ C)
+ P(A ∩ B ∩ C)
$
General inclusion-exclusion (n events):
$P(⋃_{i=1}^{n} Aᵢ) = Σ_{i} P(Aᵢ) − Σ_{i<j} P(Aᵢ ∩ Aⱼ) + Σ_{i<j<k} P(Aᵢ ∩ Aⱼ ∩ Aⱼ) − ... + (−1)^{n+1} P(A₁ ∩ ... ∩ A_n)
$
4. Probability as Measure
Probability is a special case of a measure: a normalized measure where the total measure of the space is 1. This connection to measure theory unifies discrete and continuous probability.
- Finite additivity follows from countable additivity (set Aₙ = ∅ for n > N)
- Continuity of probability: If A₁ ⊆ A₂ ⊆ A₃ ⊆ ... (increasing sequence), then P(⋃Aₙ) = lim_{n→∞} P(Aₙ)
- If A₁ ⊇ A₂ ⊇ A₃ ⊇ ... (decreasing sequence), then P(⋂Aₙ) = lim_{n→∞} P(Aₙ)
Common Pitfall: Countable additivity does NOT imply uncountable additivity. You cannot sum probabilities over an uncountable collection of disjoint events.
5. Equally Likely Outcomes (Classical Probability)
When all outcomes are equally likely and Ω is finite:
$P(A) = |A| / |Ω| = (number of favorable outcomes) / (total number of outcomes) $
This reduces probability to counting. Used extensively in combinatorics problems (cards, dice, lotteries).
Example: Rolling two fair dice. Ω has 36 equally likely ordered pairs. Event "sum = 7" has 6 favorable outcomes: {(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)}. So P(sum = 7) = 6/36 = 1/6.
Edge case: The "equally likely" assumption must be justified. It fails for biased coins, weighted dice, or non-uniform distributions.
Key Terms
- Finite additivity
Worked Examples
Example 1: Applying the Axioms
In a sample space Ω, P(A) = 0.4, P(B) = 0.3, P(A ∩ B) = 0.1. Find: (a) P(Aᶜ) (b) P(A ∪ B) (c) P(A ∩ Bᶜ) (d) P(Aᶜ ∩ Bᶜ)
Solution:
(a) P(Aᶜ) = 1 − P(A) = 1 − 0.4 = 0.6
(b) P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 0.4 + 0.3 − 0.1 = 0.6
(c) A ∩ Bᶜ = A \ B, and A = (A ∩ B) ∪ (A ∩ Bᶜ) with disjoint union. So P(A) = P(A ∩ B) + P(A ∩ Bᶜ) → P(A ∩ Bᶜ) = 0.4 − 0.1 = 0.3
(d) By De Morgan: Aᶜ ∩ Bᶜ = (A ∪ B)ᶜ P(Aᶜ ∩ Bᶜ) = 1 − P(A ∪ B) = 1 − 0.6 = 0.4
Verify: P(A) + P(B) − P(A ∩ B) + P(Aᶜ ∩ Bᶜ) = 0.4 + 0.3 − 0.1 + 0.4 = 1.0. Total probability sums to 1. ✓
Example 2: Inclusion-Exclusion with Three Events
In a survey of 100 students: - 60 study math (M) - 45 study physics (P) - 35 study chemistry (C) - 25 study both math and physics - 20 study both math and chemistry - 15 study both physics and chemistry - 8 study all three
How many study at least one subject? Exactly two subjects?
Solution:
P(at least one) = P(M) + P(P) + P(C) − P(M∩P) − P(M∩C) − P(P∩C) + P(M∩P∩C) = 0.60 + 0.45 + 0.35 − 0.25 − 0.20 − 0.15 + 0.08 = 0.88
So 88 students study at least one subject.
For exactly two subjects: Students in exactly two = P(M∩P) + P(M∩C) + P(P∩C) − 3·P(M∩P∩C) = 25 + 20 + 15 − 3(8) = 60 − 24 = 36 students.
Example 3: Showing P(A \ B) = P(A) − P(A ∩ B)
Proof using axioms:
Write A as the disjoint union A = (A ∩ B) ∪ (A \ B). These are disjoint because (A ∩ B) ∩ (A \ B) = A ∩ B ∩ A ∩ Bᶜ = A ∩ (B ∩ Bᶜ) = ∅.
By finite additivity: P(A) = P(A ∩ B) + P(A \ B). Therefore P(A \ B) = P(A) − P(A ∩ B). ∎
Quiz
Q1: Which of the following is NOT one of Kolmogorov's three axioms of probability?
A) P(A) ≥ 0 for all events A B) P(Ω) = 1 C) P(Aᶜ) = 1 − P(A) D) Countable additivity for disjoint events
Correct: C)
- If you chose C: Correct! The complement rule P(Aᶜ) = 1 − P(A) is a theorem proven FROM the axioms, not an axiom itself.
- If you chose A: This IS Axiom 1 (non-negativity). It's fundamental — probability can never be negative.
- If you chose B: This IS Axiom 2 (normalization). The probability of the entire sample space must be 1.
- If you chose D: This IS Axiom 3 (countable additivity). It's the key axiom that makes probability a measure.
Q2: If P(A) = 0.7 and P(B) = 0.5, and A and B are mutually exclusive (disjoint), what is P(A ∪ B)?
A) 0.2 B) 0.85 C) 1.2 D) 0.35
Correct: C)
- If you chose C: Correct! For disjoint events, P(A ∪ B) = P(A) + P(B) = 0.7 + 0.5 = 1.2. Wait — this exceeds 1, so the situation is impossible if these are probabilities! Good catch if you noticed: probabilities cannot sum to more than 1, so A and B cannot both have these probabilities AND be disjoint.
- If you chose A: This is P(A) − P(B), not the union formula.
- If you chose B: This is P(A) + P(B) − P(A)P(B), which would apply for independent events, not disjoint ones.
- If you chose D: This is P(A)P(B), the probability of intersection for independent events.
Q3: By De Morgan's Law, (A ∪ B)ᶜ is equal to:
A) Aᶜ ∪ Bᶜ B) Aᶜ ∩ Bᶜ C) A ∩ B D) (A ∩ B) ∪ (Aᶜ ∩ Bᶜ)
Correct: B)
- If you chose B: Correct! De Morgan's Law: complement of a union is the intersection of complements. "Neither A nor B" = "Not A AND Not B."
- If you chose A: This would mean "not A or not B," which is the complement of A ∩ B, not A ∪ B.
- If you chose C: This is the intersection, not the complement of the union.
- If you chose D: This is a partition of Ω, not the complement of the union.
Q4: In the inclusion-exclusion formula for three events A, B, C, what is the sign of the P(A ∩ B ∩ C) term?
A) Positive B) Negative C) It depends on whether the events are disjoint D) Zero
Correct: A)
- If you chose A: Correct! The sign pattern for n events is (−1)^{k+1} for the k-fold intersection. For k=1: positive. k=2: negative. k=3: positive. So the triple intersection is added back.
- If you chose B: This would be the sign for k=2 (pairwise intersections).
- If you chose C: The formula is general and the sign pattern is fixed regardless of event relationships.
- If you chose D: The triple intersection term is only zero if the three events have empty intersection.
Q5: If P(A) = 0.6 and P(B) = 0.5, what is the maximum possible value of P(A ∩ B)?
A) 0.1 B) 0.5 C) 0.6 D) 1.0
Correct: B)
- If you chose B: Correct! Since A ∩ B ⊆ B, we must have P(A ∩ B) ≤ P(B) = 0.5. The maximum is 0.5, achieved when B ⊆ A.
- If you chose A: This would be the minimum possible value: P(A ∩ B) ≥ P(A) + P(B) − 1 = 0.1 (Bonferroni's inequality).
- If you chose C: This would require A ∩ B = A, meaning A ⊆ B, which is impossible since P(A) > P(B).
- If you chose D: Probabilities cannot exceed 1, and this intersection is bounded by both P(A) and P(B).
Q6: Which of the following is a consequence of Kolmogorov's axioms?
A) P(A ∪ B) = P(A) + P(B) for all events B) If A ⊆ B, then P(A) ≤ P(B) C) P(A ∩ B) = P(A)P(B) D) P(A | B) = P(B | A)
Correct: B)
- If you chose B: Correct! Monotonicity is proved from the axioms: if A ⊆ B, then B = A ∪ (B\A) with disjoint union, so P(B) = P(A) + P(B\A) ≥ P(A).
- If you chose A: This only holds when A and B are disjoint. The general formula is P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
- If you chose C: This holds only when A and B are independent — it's a definition, not a theorem.
- If you chose D: This is generally false. Bayes' theorem relates them: P(A|B) = P(B|A)P(A)/P(B).
Q7: A fair coin is tossed 3 times. What is P(exactly 2 heads)?
A) 1/8 B) 3/8 C) 1/2 D) 3/4
Correct: B)
- If you chose B: Correct! Ω has 8 equally likely outcomes. Exactly 2 heads occurs for {HHT, HTH, THH} — 3 outcomes. So P = 3/8.
- If you chose A: This is P(HHH), the probability of all three heads.
- If you chose C: This is P(at least one head) = 7/8, not exactly two.
- If you chose D: This is P(at least one tail) = 7/8, or perhaps P(at least 2 heads) = 4/8 = 1/2.
Q8: P(∅) = 0 is:
A) An axiom of probability B) A theorem derived from the axioms C) Only true for finite sample spaces D) True only when ∅ is the empty set
Correct: B)
- If you chose B: Correct! P(∅) = 0 is derived: ∅ = Ωᶜ, so P(∅) = 1 − P(Ω) = 1 − 1 = 0. It follows from Axioms 2 and 3.
- If you chose A: The axioms only state non-negativity, normalization, and countable additivity. P(∅) = 0 is not itself an axiom.
- If you chose C: It holds for all sample spaces, finite or infinite.
- If you chose D: ∅ IS the empty set by definition; this is just playing with words.
Practice Problems
-
If P(A) = 0.5, P(B) = 0.4, and P(A ∩ B) = 0.2, compute P(A ∪ B), P(Aᶜ), P(Bᶜ), and P(Aᶜ ∩ B).
-
Prove that P(A ∩ B) ≥ P(A) + P(B) − 1. (This is Bonferroni's inequality.)
-
A fair coin is tossed 3 times. List the sample space Ω. Find the probability of: (a) exactly 2 heads, (b) at least 1 head, (c) no heads.
-
For three events A, B, C, derive the formula for P(A ∪ B ∪ C) by applying the two-event addition rule twice.
-
A card is drawn from a standard 52-card deck. Find: (a) P(heart or king), (b) P(face card or ace), (c) P(red or spade).
-
Show that if P(A) = 0, then P(A ∩ B) = 0 for any event B.
-
Use inclusion-exclusion to find the probability that a randomly chosen integer from 1 to 100 is divisible by 2, 3, or 5.
Answers
1. P(A ∪ B) = 0.5 + 0.4 − 0.2 = 0.7; P(Aᶜ) = 0.5; P(Bᶜ) = 0.6; P(Aᶜ ∩ B) = P(B) − P(A ∩ B) = 0.4 − 0.2 = 0.2. 2. P(A ∪ B) = P(A) + P(B) − P(A ∩ B) ≤ 1. Rearranging: P(A ∩ B) ≥ P(A) + P(B) − 1. 3. Ω = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT} (8 outcomes). (a) 3/8, (b) 7/8, (c) 1/8. 4. P(A ∪ B ∪ C) = P((A ∪ B) ∪ C) = P(A ∪ B) + P(C) − P((A ∪ B) ∩ C) = P(A)+P(B)−P(A∩B)+P(C)−P((A∩C)∪(B∩C)) = P(A)+P(B)+P(C)−P(A∩B)−P(A∩C)−P(B∩C)+P(A∩B∩C). 5. (a) Hearts (13) + Kings (4) − King of hearts (1) = 16/52 = 4/13. (b) Face cards (12) + Aces (4) = 16/52 = 4/13 (disjoint). (c) Red (26) + Spades (13) = 39/52 = 3/4 (disjoint: spades are black). 6. Since A ∩ B ⊆ A, by monotonicity P(A ∩ B) ≤ P(A) = 0, so P(A ∩ B) = 0. 7. Let A={div by 2}, B={div by 3}, C={div by 5}. |A|=50, |B|=33, |C|=20. |A∩B|=16 (div by 6), |A∩C|=10 (div by 10), |B∩C|=6 (div by 15), |A∩B∩C|=3 (div by 30). Inclusion-exclusion: 50+33+20−16−10−6+3 = 74. P = 74/100 = 0.74.Summary
- A sample space Ω contains all possible outcomes; events are subsets of Ω. Probability is a function P: F → [0, 1] satisfying non-negativity, normalization, and countable additivity
- Kolmogorov's three axioms imply the complement rule P(Aᶜ) = 1 − P(A), monotonicity, and the bound 0 ≤ P(A) ≤ 1
- For any events, P(A ∪ B) = P(A) + P(B) − P(A ∩ B). Inclusion-exclusion generalizes to n events with alternating signs of intersections
- Probability is a finite measure with total mass 1; this measure-theoretic view unifies discrete and continuous settings
- In classical equally-likely settings, P(A) = |A|/|Ω|, reducing probability to combinatorial counting
Pitfalls
- Confusing the complement rule for an axiom. P(A^c) = 1 - P(A) is a theorem derived from the axioms, not an axiom itself. Kolmogorov's three axioms are: non-negativity, normalization, and countable additivity.
- Treating countable additivity as implying uncountable additivity. You cannot sum probabilities over an uncountable collection of disjoint events. This distinction is why measure theory is needed for continuous probability.
- Assuming P(A) = 0 means A = ∅. In continuous probability spaces, events with probability zero can still occur (e.g., a continuous random variable taking any specific value). Probability zero does not mean impossibility.
- Forgetting the subtraction term in the general addition rule. P(A ∪ B) = P(A) + P(B) only when A and B are disjoint. The general formula is P(A) + P(B) - P(A ∩ B). Applying the disjoint formula to overlapping events double-counts the intersection.
- Getting inclusion-exclusion signs wrong. The sign pattern alternates: add single events, subtract pairwise intersections, add triple intersections, and so on. For n events, the sign of the k-fold intersection term is (-1)^{k+1}.
Quiz
-
Which of the following is NOT one of Kolmogorov's axioms? a) P(Ω) = 1 b) P(Aᶜ) = 1 − P(A) c) P(A) ≥ 0 for all events A d) Countable additivity for disjoint events Answer: b. The complement rule is a theorem, not an axiom.
-
If A and B are disjoint and P(A) = 0.3, P(B) = 0.5, what is P(A ∪ B)? a) 0.8 b) 0.65 c) 0.15 d) Cannot be determined Answer: a. For disjoint events, P(A ∪ B) = P(A) + P(B) = 0.8.
-
If P(A) = 0.6 and P(B) = 0.5, what is the maximum possible value of P(A ∩ B)? a) 0.1 b) 0.5 c) 0.6 d) 1.0 Answer: b. A ∩ B ⊆ B, so P(A ∩ B) ≤ P(B) = 0.5. The maximum is 0.5 (when B ⊆ A).
-
De Morgan's law states that (A ∪ B)ᶜ equals: a) Aᶜ ∪ Bᶜ b) Aᶜ ∩ Bᶜ c) A ∩ B d) (A ∩ B)ᶜ Answer: b. (A ∪ B)ᶜ = Aᶜ ∩ Bᶜ.
-
In inclusion-exclusion for n events, the sign of the k-fold intersection term is: a) Always positive b) (−1)^{k+1} c) (−1)^{k} d) Positive for even k, negative for odd k Answer: b. The general term for a k-way intersection has sign (−1)^{k+1} — positive for k=1, negative for k=2, etc.
-
If P(A) = 0, which must be true? a) A = ∅ b) P(A ∪ B) = P(B) for any B c) A is impossible d) Both b and c Answer: b. P(A) = 0 does not imply A = ∅ (consider a continuous random variable equaling a specific value). But P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 0 + P(B) − 0 = P(B).
-
For a fair die roll, what is P(outcome ≤ 3 or even)? a) 1/2 b) 2/3 c) 5/6 d) 1 Answer: c. A = {1,2,3}, B = {2,4,6}. A ∩ B = {2}. P = 3/6 + 3/6 − 1/6 = 5/6.
-
True or False: If P(A ∪ B) = P(A) + P(B), then A and B must be disjoint. a) True b) False — they could overlap with P(A ∩ B) = 0 Answer: b. P(A ∪ B) = P(A) + P(B) − P(A ∩ B). So P(A ∪ B) = P(A) + P(B) iff P(A ∩ B) = 0, which does not require A ∩ B = ∅.
Next Steps
Continue to 10-02 Conditional Probability to learn about P(A|B), the multiplication rule, the law of total probability, and Bayes' theorem.