Phase 10: Probability Theory
Subject 10-02: Conditional Probability
Prerequisites: 10-01 (Probability Foundations) — sample spaces, axioms, addition rule, set operations
Learning Objectives
- Define conditional probability P(A|B) and explain when and why it differs from unconditional probability
- Apply the multiplication rule P(A ∩ B) = P(B) · P(A|B) in sequential experiments and tree diagrams
- Use the Law of Total Probability to decompose complex events over a partition of the sample space
- State, prove, and apply Bayes' theorem to invert conditional probabilities, distinguishing prior, likelihood, and posterior
- Solve classic conditional probability puzzles including the Monty Hall problem
Core Content
1. Definition of Conditional Probability
Definition: For events A and B with P(B) > 0, the conditional probability of A given B is:
$P(A|B) = P(A ∩ B) / P(B) $
This re-normalizes the sample space to B: among outcomes where B occurs, what fraction also have A occur?
Interpretation: P(A|B) answers "if we know B happened, how likely is A?"
Key properties: - P(Ω|B) = P(Ω ∩ B)/P(B) = P(B)/P(B) = 1 - P(∅|B) = 0 - For fixed B with P(B) > 0, P(·|B) is a valid probability measure on the same sample space - P(A|B) + P(Aᶜ|B) = 1
Edge case: If P(B) = 0, P(A|B) is undefined. You cannot condition on an impossible event.
Common misconception: P(A|B) = P(B|A) is false in general. Symmetry only holds when P(A) = P(B), which is rare.
2. The Multiplication Rule
From the definition, multiplying both sides by P(B):
$P(A ∩ B) = P(B) · P(A|B) $
By symmetry: P(A ∩ B) = P(A) · P(B|A) (provided P(A) > 0).
General multiplication rule (chain rule):
For events A₁, A₂, ..., Aₙ with P(A₁ ∩ ... ∩ A_{n-1}) > 0:
$P(A₁ ∩ A₂ ∩ ... ∩ Aₙ) = P(A₁) · P(A₂|A₁) · P(A₃|A₁ ∩ A₂) · ... · P(Aₙ|A₁ ∩ ... ∩ A_{n-1})
$
Proof by induction: True for n=2. Assume true for n-1. Then P(∩{i=1}^{n} Aᵢ) = P(∩{i=1}^{n-1} Aᵢ) · P(Aₙ | ∩_{i=1}^{n-1} Aᵢ). By the induction hypothesis, the first factor is the chain product up to n-1. ∎
Tree diagrams: The chain rule is naturally visualized with probability trees — branching at each stage, multiplying probabilities along a path gives the probability of that path.
3. Law of Total Probability (LOTP)
If B₁, B₂, ..., Bₙ form a partition of Ω (they are pairwise disjoint and their union is Ω, with each P(Bᵢ) > 0), then for any event A:
$P(A) = Σ_{i=1}^{n} P(A|Bᵢ) · P(Bᵢ)
$
Proof: A = A ∩ Ω = A ∩ (⋃Bᵢ) = ⋃(A ∩ Bᵢ). Since the Bᵢ are disjoint, the A ∩ Bᵢ are also disjoint. By countable additivity: P(A) = Σ P(A ∩ Bᵢ) = Σ P(A|Bᵢ) P(Bᵢ). ∎
Why this matters: LOTP lets you compute P(A) by "averaging" over all the mutually exclusive ways A could happen. It breaks a hard problem into easier conditional pieces.
Special case (binary partition):
$P(A) = P(A|B) · P(B) + P(A|Bᶜ) · P(Bᶜ) $
4. Bayes' Theorem
For a partition B₁, ..., Bₙ and any event A with P(A) > 0:
$P(Bⱼ|A) = P(A|Bⱼ) · P(Bⱼ) / P(A) = P(A|Bⱼ) P(Bⱼ) / Σ_{i} P(A|Bᵢ) P(Bᵢ)
$
Proof: By definition, P(Bⱼ|A) = P(Bⱼ ∩ A) / P(A) = P(A|Bⱼ) P(Bⱼ) / P(A). Then substitute LOTP for P(A). ∎
Bayesian vocabulary: - Prior: P(Bⱼ) — our belief about Bⱼ before observing A - Likelihood: P(A|Bⱼ) — how likely A is under hypothesis Bⱼ - Posterior: P(Bⱼ|A) — our updated belief about Bⱼ after observing A - Evidence/Normalizing constant: P(A) = Σ P(A|Bᵢ) P(Bᵢ)
Bayes' theorem updates beliefs: Posterior ∝ Likelihood × Prior.
Intuition: Bayes' theorem "inverts" the direction of conditioning. We know P(observation | hypothesis) and want P(hypothesis | observation).
5. The Monty Hall Problem
This classic problem illustrates the importance of conditional probability:
Problem: You're on a game show with 3 doors. Behind 1 door is a car (prize); behind the other 2 are goats. You pick a door. The host (Monty), who knows what's behind all doors, opens one of the remaining doors revealing a goat. He then offers you the chance to switch to the other unopened door. Should you switch?
Solution (with conditional probability):
Let C be the door with the car (1, 2, or 3), X be your initial pick, M be the door Monty opens.
Assume without loss: you pick door 1. Monty opens door 2 or 3, always revealing a goat.
We need P(C=2 | M=3) — probability the car is behind door 2 given Monty opened door 3.
By Bayes:
$P(C=2 | M=3) = P(M=3 | C=2) · P(C=2) / P(M=3) $
- Prior: P(C=2) = 1/3
- Likelihood: If C=2 (car behind door 2), Monty MUST open door 3 (can't open your door 1, can't open car door 2). So P(M=3 | C=2) = 1.
- If C=1 (car behind your door), Monty can open either door 2 or 3. Assuming he chooses randomly: P(M=3 | C=1) = 1/2.
- If C=3 (car behind door 3), Monty cannot open door 3. P(M=3 | C=3) = 0.
By LOTP:
$P(M=3) = P(M=3|C=1)·P(C=1) + P(M=3|C=2)·P(C=2) + P(M=3|C=3)·P(C=3)
= (1/2)(1/3) + (1)(1/3) + (0)(1/3) = 1/6 + 1/3 = 1/2
$
Therefore:
$P(C=2 | M=3) = (1)(1/3) / (1/2) = (1/3)/(1/2) = 2/3 $
Switching gives probability 2/3 of winning. Staying gives 1/3. Always switch.
Why this is counterintuitive: Many people think "two doors left, so 50-50." They forget that Monty's action is informative — he deliberately avoids opening the car door, so his choice leaks information.
Key Terms
- 10 02 Conditional Probability
- 10-03 Independence
- Always switch.
- Answer: a.
- Answer: b.
- Answer: c.
- Subject 10-02: Conditional Probability
- partition
Worked Examples
Example 1: Diagnostic Testing
A disease affects 1% of the population. A test is 95% sensitive (P(positive | disease) = 0.95) and 90% specific (P(negative | no disease) = 0.90). If a person tests positive, what is the probability they actually have the disease?
Solution:
Let D = "has disease", T+ = "tests positive".
Given: P(D) = 0.01, P(T+|D) = 0.95, P(T−|Dᶜ) = 0.90 → P(T+|Dᶜ) = 0.10.
We want P(D|T+).
By Bayes:
$P(D|T+) = P(T+|D) P(D) / [P(T+|D) P(D) + P(T+|Dᶜ) P(Dᶜ)]
= (0.95)(0.01) / [(0.95)(0.01) + (0.10)(0.99)]
= 0.0095 / (0.0095 + 0.099)
= 0.0095 / 0.1085 ≈ 0.0876
$
Only about 8.8%! Despite the test seeming accurate, the disease is so rare that most positives are false positives. This is the "base rate fallacy."
Example 2: Two-Stage Experiment
Urn 1 contains 3 red and 5 blue balls. Urn 2 contains 4 red and 2 blue balls. A fair coin is flipped: if heads, draw from Urn 1; if tails, draw from Urn 2. What is P(red)?
Solution:
Partition B₁ = {draw from Urn 1}, B₂ = {draw from Urn 2}.
P(B₁) = P(B₂) = 1/2.
P(red|B₁) = 3/8, P(red|B₂) = 4/6 = 2/3.
By LOTP:
$P(red) = P(red|B₁) P(B₁) + P(red|B₂) P(B₂)
= (3/8)(1/2) + (2/3)(1/2)
= 3/16 + 1/3 = (9 + 16)/48 = 25/48 ≈ 0.521
$
Example 3: Bayes with Three Hypotheses
A factory has three machines producing the same item. Machine A produces 30% of items with 2% defect rate, Machine B produces 45% with 3% defect rate, Machine C produces 25% with 1% defect rate. A randomly selected item is defective. Which machine most likely produced it?
Solution:
Let D = defective. We want P(A|D), P(B|D), P(C|D).
P(A) = 0.30, P(B) = 0.45, P(C) = 0.25. P(D|A) = 0.02, P(D|B) = 0.03, P(D|C) = 0.01.
P(D) = (0.02)(0.30) + (0.03)(0.45) + (0.01)(0.25) = 0.006 + 0.0135 + 0.0025 = 0.022.
P(A|D) = (0.02)(0.30) / 0.022 = 0.006/0.022 ≈ 0.273 P(B|D) = (0.03)(0.45) / 0.022 = 0.0135/0.022 ≈ 0.614 P(C|D) = (0.01)(0.25) / 0.022 = 0.0025/0.022 ≈ 0.114
Machine B is most likely, despite not having the highest defect rate — it produces the largest share of items.
Quiz
Q1: Which of the following best describes conditional probability P(A|B)?
A) The probability of A and B both occurring B) The probability of A occurring, re-normalized to the subspace where B occurs C) The probability of B occurring given A D) P(A) × P(B)
Correct: B)
- If you chose B: Correct! P(A|B) = P(A ∩ B)/P(B) restricts the sample space to B and measures A's proportion within it.
- If you chose A: That's P(A ∩ B), the joint probability, not conditional.
- If you chose C: That's P(B|A), the reverse conditional. These are not generally equal.
- If you chose D: That's P(A)P(B), which equals P(A ∩ B) only when A and B are independent.
Q2: A disease affects 1% of the population. A test is 95% sensitive and 90% specific. If a person tests positive, the probability they actually have the disease is approximately:
A) 95% B) 90% C) 50% D) 8.8%
Correct: D)
- If you chose D: Correct! By Bayes: P(D|+) = (0.95 × 0.01)/(0.95 × 0.01 + 0.10 × 0.99) ≈ 0.0095/0.1085 ≈ 8.8%. This is the base rate fallacy — the disease is so rare that most positives are false positives.
- If you chose A: This is the sensitivity P(+|D), not the posterior P(D|+).
- If you chose B: This is the specificity P(−|Dᶜ), not relevant to the question.
- If you chose C: This is a common intuitive guess (50-50), but Bayes' theorem reveals it's much lower.
Q3: The Law of Total Probability states that if {B₁, B₂, ..., Bₙ} form a partition of Ω, then:
A) P(A) = max(P(A|B₁), ..., P(A|Bₙ)) B) P(A) = Σ P(A|Bᵢ) P(Bᵢ) C) P(A) = P(A|B₁) + P(A|B₂) + ... + P(A|Bₙ) D) P(A) = Π P(A|Bᵢ)
Correct: B)
- If you chose B: Correct! P(A) is the weighted average of conditional probabilities, weighted by P(Bᵢ). This decomposition breaks complex problems into manageable pieces.
- If you chose A: The law of total probability uses averaging, not taking the maximum.
- If you chose C: This ignores the weights P(Bᵢ); the formula requires multiplying by P(Bᵢ).
- If you chose D: This would be a product, not a sum — the law uses addition.
Q4: In the Monty Hall problem, why does switching doors give a 2/3 chance of winning?
A) There are two doors left, so it's 50-50 B) The car is more likely behind door 2 C) Monty's constrained choice reveals information: he must avoid the car and your initial pick D) The probability resets after Monty opens a door
Correct: C)
- If you chose C: Correct! Monty cannot open the door with the car or your initially chosen door. If your initial pick was wrong (probability 2/3), Monty's remaining door must have the car.
- If you chose A: This is the common fallacy. Monty's action is not random — it provides information.
- If you chose B: The problem is symmetric; all doors initially have equal probability.
- If you chose D: Probabilities update with new information (Bayes' theorem), they don't "reset."
Q5: If P(B) = 0, what is P(A|B)?
A) 0 B) 1 C) Undefined D) P(A)
Correct: C)
- If you chose C: Correct! P(A|B) = P(A ∩ B)/P(B) requires division by P(B). When P(B) = 0, the expression is undefined — you cannot condition on a probability-zero event in elementary probability.
- If you chose A: P(A ∩ B) = 0 when P(B) = 0, but the conditional probability itself is undefined, not zero.
- If you chose B: Division by zero does not yield 1.
- If you chose D: This would only be true if A and B are independent AND P(B) > 0.
Q6: In Bayes' theorem, what term does the denominator P(A) represent?
A) Prior belief B) Likelihood of the evidence C) The evidence (marginal probability) D) Posterior probability
Correct: C)
- If you chose C: Correct! P(A) is the marginal probability of the observed evidence, also called the normalizing constant or model evidence.
- If you chose A: The prior is P(B), our belief before observing evidence.
- If you chose B: The likelihood is P(A|B), how probable the evidence is under hypothesis B.
- If you chose D: The posterior is P(B|A), our updated belief after observing evidence.
Q7: The chain rule P(A₁ ∩ A₂ ∩ A₃) equals:
A) P(A₁)P(A₂)P(A₃) B) P(A₁)P(A₂|A₁)P(A₃|A₁ ∩ A₂) C) P(A₁|A₂)P(A₂|A₃)P(A₃) D) P(A₁ ∪ A₂ ∪ A₃)
Correct: B)
- If you chose B: Correct! The chain rule multiplies sequentially: P(A₁) × P(A₂|A₁) × P(A₃|A₁ ∩ A₂). Each term conditions on all previous events.
- If you chose A: This only holds when all three events are mutually independent.
- If you chose C: The conditioning goes in the wrong direction.
- If you chose D: This is the probability of the union, which is a different concept.
Practice Problems
-
If P(A) = 0.4, P(B) = 0.3, P(A ∩ B) = 0.15, find P(A|B), P(B|A), P(A|Bᶜ), and P(Aᶜ|Bᶜ).
-
Two cards are drawn without replacement from a 52-card deck. Find: (a) P(both aces), (b) P(second is an ace | first is an ace), (c) P(at least one ace).
-
A family has two children. Given that at least one is a boy, what is the probability both are boys? (Assume P(boy) = 1/2 and independence.)
-
Prove Bayes' theorem for the simple two-event case: P(A|B) = P(B|A) P(A) / P(B).
-
60% of emails are spam. A spam filter correctly flags 98% of spam but incorrectly flags 5% of legitimate emails. If an email is flagged, what is the probability it is actually spam?
-
A box has 5 good and 3 defective items. Two items are drawn without replacement. Use the chain rule to find P(both are good).
-
Show that for a partition {B₁, B₂, B₃} of Ω, P(A) = P(A|B₁)P(B₁) + P(A|B₂)P(B₂) + P(A|B₃)P(B₃). Then verify that Σ_{i} P(Bᵢ|A) = 1.
Answers
1. P(A|B) = 0.15/0.3 = 0.5; P(B|A) = 0.15/0.4 = 0.375; P(A|Bᶜ) = P(A ∩ Bᶜ)/P(Bᶜ) = (0.4−0.15)/0.7 = 0.25/0.7 ≈ 0.357; P(Aᶜ|Bᶜ) = P(Aᶜ ∩ Bᶜ)/P(Bᶜ) where P(Aᶜ ∩ Bᶜ) = 1 − P(A ∪ B) = 1 − (0.4+0.3−0.15) = 0.45, so 0.45/0.7 ≈ 0.643. 2. (a) P(both aces) = (4/52)(3/51) = 1/221. (b) P(2nd ace | 1st ace) = 3/51 = 1/17. (c) P(at least one ace) = 1 − P(no aces) = 1 − (48/52)(47/51) = 1 − 188/221 = 33/221. 3. Sample space: {BB, BG, GB, GG}, each 1/4. Given at least one boy: {BB, BG, GB}. Among these, P(BB) = (1/4)/(3/4) = 1/3. The answer is 1/3, not 1/2. 4. P(A|B) = P(A ∩ B)/P(B) = [P(B|A) P(A)]/P(B). Done. 5. S = spam, F = flagged. P(S) = 0.6, P(F|S) = 0.98, P(F|Sᶜ) = 0.05. P(S|F) = (0.98)(0.6) / [(0.98)(0.6) + (0.05)(0.4)] = 0.588/0.608 ≈ 0.967. 6. Let Gᵢ = "i-th item is good". P(G₁ ∩ G₂) = P(G₁) P(G₂|G₁) = (5/8)(4/7) = 20/56 = 5/14. 7. As in the core content proof for the general case. For verification: Σᵢ P(Bᵢ|A) = Σᵢ [P(A|Bᵢ)P(Bᵢ)/P(A)] = [Σᵢ P(A|Bᵢ)P(Bᵢ)] / P(A) = P(A)/P(A) = 1.Summary
- P(A|B) = P(A ∩ B)/P(B) re-normalizes the sample space to B; it is a valid probability measure for fixed B
- The multiplication rule P(A ∩ B) = P(B) P(A|B) extends to chain rule for sequential events
- Law of Total Probability P(A) = Σ P(A|Bᵢ) P(Bᵢ) computes unconditional probability by averaging over a partition
- Bayes' theorem P(B|A) = P(A|B) P(B)/P(A) inverts conditional probabilities, updating prior beliefs with observed evidence
- Conditional probability problems that seem counterintuitive (Monty Hall, diagnostic testing) are resolved by correctly identifying what is conditioned on and recognizing that apparently "random" actions may carry information
Pitfalls
- Assuming P(A|B) = P(B|A). These are generally different — they are the numerators of Bayes' theorem with different denominators. Symmetry only holds when P(A) = P(B), which is rare.
- Conditioning on probability-zero events. P(A|B) is undefined when P(B) = 0 within elementary probability theory. You cannot condition on events that never occur.
- Forgetting that the Law of Total Probability requires a partition. The conditioning events must be mutually exclusive AND exhaustive with positive probability. Applying LOTP to non-partitions produces incorrect results.
- Applying unconditional multiplication in sequential settings. P(A ∩ B ∩ C) = P(A) P(B|A) P(C|A∩B) via the chain rule. Multiplying unconditionally (P(A)P(B)P(C)) is only valid when all events are mutually independent.
- Falling for the base rate fallacy. A test with 95% sensitivity and 90% specificity for a disease affecting 1% of the population yields only about 8.8% P(disease|positive). Always incorporate the base rate via Bayes' theorem — diagnostic accuracy alone is misleading.
Quiz
-
If P(B) = 0, what is P(A|B)? a) 0 b) 1 c) Undefined d) P(A) Answer: c. Division by zero is undefined. You cannot condition on an event of probability zero within elementary probability.
-
For any A with P(A) > 0, P(A|A) equals: a) P(A)² b) 1 c) P(A) d) 0 Answer: b. P(A|A) = P(A ∩ A)/P(A) = P(A)/P(A) = 1.
-
If P(A|B) > P(A), what can we conclude? a) P(B|A) > P(B) b) A and B are independent c) P(B) > P(A) d) A and B are mutually exclusive Answer: a. P(A|B) > P(A) ⇔ P(A ∩ B)/P(B) > P(A) ⇔ P(A ∩ B) > P(A)P(B) ⇔ P(B|A) = P(A ∩ B)/P(A) > P(B). This is a symmetric relationship.
-
In Bayes' theorem, the denominator P(A) is called the: a) Prior b) Likelihood c) Evidence d) Posterior Answer: c. P(A) acts as a normalizing constant, also called the marginal likelihood or evidence.
-
Two fair coins are flipped. Given that at least one is heads, P(both heads) = ? a) 1/4 b) 1/3 c) 1/2 d) 2/3 Answer: b. Ω = {HH, HT, TH, TT}. "At least one heads" = {HH, HT, TH} (3 outcomes). Among these, 1 is HH. P = 1/3.
-
In the Monty Hall problem, why is the probability of winning by switching 2/3 and not 1/2? a) Because there are more goats than cars b) Because Monty's choice of door to open is constrained by the car's location and provides information c) Because the problem is symmetrical d) Because the car is placed after you pick Answer: b. Monty cannot open the car door or your initially chosen door, so his action reveals information about the car's location.
-
A test is 99% accurate for a disease that affects 1 in 1000 people. Probability you have the disease given a positive test is approximately: a) 99% b) 50% c) 9% d) 1% Answer: c. P(D|+) = (0.99)(0.001) / [(0.99)(0.001) + (0.01)(0.999)] ≈ 0.00099/0.01098 ≈ 0.09. Despite the high accuracy, the rarity of the disease makes most positives false.
-
The Law of Total Probability requires the conditioning events to: a) Be independent b) Form a partition of Ω c) Have equal probability d) Be mutually exclusive but need not be exhaustive Answer: b. They must be pairwise disjoint AND exhaustive (union = Ω) with positive probability.
Next Steps
Continue to 10-03 Independence to learn about the definition of independent events, pairwise vs. mutual independence, and conditional independence.