Math graphic
📐 Concept diagram

Phase 10: Probability Theory

Subject 10-02: Conditional Probability

Prerequisites: 10-01 (Probability Foundations) — sample spaces, axioms, addition rule, set operations


Learning Objectives

  1. Define conditional probability P(A|B) and explain when and why it differs from unconditional probability
  2. Apply the multiplication rule P(A ∩ B) = P(B) · P(A|B) in sequential experiments and tree diagrams
  3. Use the Law of Total Probability to decompose complex events over a partition of the sample space
  4. State, prove, and apply Bayes' theorem to invert conditional probabilities, distinguishing prior, likelihood, and posterior
  5. Solve classic conditional probability puzzles including the Monty Hall problem

Core Content

1. Definition of Conditional Probability

Definition: For events A and B with P(B) > 0, the conditional probability of A given B is:

$P(A|B) = P(A ∩ B) / P(B)
$

This re-normalizes the sample space to B: among outcomes where B occurs, what fraction also have A occur?

Interpretation: P(A|B) answers "if we know B happened, how likely is A?"

Key properties: - P(Ω|B) = P(Ω ∩ B)/P(B) = P(B)/P(B) = 1 - P(∅|B) = 0 - For fixed B with P(B) > 0, P(·|B) is a valid probability measure on the same sample space - P(A|B) + P(Aᶜ|B) = 1

Edge case: If P(B) = 0, P(A|B) is undefined. You cannot condition on an impossible event.

Common misconception: P(A|B) = P(B|A) is false in general. Symmetry only holds when P(A) = P(B), which is rare.

2. The Multiplication Rule

From the definition, multiplying both sides by P(B):

$P(A ∩ B) = P(B) · P(A|B)
$

By symmetry: P(A ∩ B) = P(A) · P(B|A) (provided P(A) > 0).

General multiplication rule (chain rule):

For events A₁, A₂, ..., Aₙ with P(A₁ ∩ ... ∩ A_{n-1}) > 0:

$P(A₁ ∩ A₂ ∩ ... ∩ Aₙ) = P(A₁) · P(A₂|A₁) · P(A₃|A₁ ∩ A₂) · ... · P(Aₙ|A₁ ∩ ... ∩ A_{n-1})
$

Proof by induction: True for n=2. Assume true for n-1. Then P(∩{i=1}^{n} Aᵢ) = P(∩{i=1}^{n-1} Aᵢ) · P(Aₙ | ∩_{i=1}^{n-1} Aᵢ). By the induction hypothesis, the first factor is the chain product up to n-1. ∎

Tree diagrams: The chain rule is naturally visualized with probability trees — branching at each stage, multiplying probabilities along a path gives the probability of that path.

3. Law of Total Probability (LOTP)

If B₁, B₂, ..., Bₙ form a partition of Ω (they are pairwise disjoint and their union is Ω, with each P(Bᵢ) > 0), then for any event A:

$P(A) = Σ_{i=1}^{n} P(A|Bᵢ) · P(Bᵢ)
$

Proof: A = A ∩ Ω = A ∩ (⋃Bᵢ) = ⋃(A ∩ Bᵢ). Since the Bᵢ are disjoint, the A ∩ Bᵢ are also disjoint. By countable additivity: P(A) = Σ P(A ∩ Bᵢ) = Σ P(A|Bᵢ) P(Bᵢ). ∎

Why this matters: LOTP lets you compute P(A) by "averaging" over all the mutually exclusive ways A could happen. It breaks a hard problem into easier conditional pieces.

Special case (binary partition):

$P(A) = P(A|B) · P(B) + P(A|Bᶜ) · P(Bᶜ)
$

4. Bayes' Theorem

For a partition B₁, ..., Bₙ and any event A with P(A) > 0:

$P(Bⱼ|A) = P(A|Bⱼ) · P(Bⱼ) / P(A) = P(A|Bⱼ) P(Bⱼ) / Σ_{i} P(A|Bᵢ) P(Bᵢ)
$

Proof: By definition, P(Bⱼ|A) = P(Bⱼ ∩ A) / P(A) = P(A|Bⱼ) P(Bⱼ) / P(A). Then substitute LOTP for P(A). ∎

Bayesian vocabulary: - Prior: P(Bⱼ) — our belief about Bⱼ before observing A - Likelihood: P(A|Bⱼ) — how likely A is under hypothesis Bⱼ - Posterior: P(Bⱼ|A) — our updated belief about Bⱼ after observing A - Evidence/Normalizing constant: P(A) = Σ P(A|Bᵢ) P(Bᵢ)

Bayes' theorem updates beliefs: Posterior ∝ Likelihood × Prior.

Intuition: Bayes' theorem "inverts" the direction of conditioning. We know P(observation | hypothesis) and want P(hypothesis | observation).

5. The Monty Hall Problem

This classic problem illustrates the importance of conditional probability:

Problem: You're on a game show with 3 doors. Behind 1 door is a car (prize); behind the other 2 are goats. You pick a door. The host (Monty), who knows what's behind all doors, opens one of the remaining doors revealing a goat. He then offers you the chance to switch to the other unopened door. Should you switch?

Solution (with conditional probability):

Let C be the door with the car (1, 2, or 3), X be your initial pick, M be the door Monty opens.

Assume without loss: you pick door 1. Monty opens door 2 or 3, always revealing a goat.

We need P(C=2 | M=3) — probability the car is behind door 2 given Monty opened door 3.

By Bayes:

$P(C=2 | M=3) = P(M=3 | C=2) · P(C=2) / P(M=3)
$

By LOTP:

$P(M=3) = P(M=3|C=1)·P(C=1) + P(M=3|C=2)·P(C=2) + P(M=3|C=3)·P(C=3)
       = (1/2)(1/3) + (1)(1/3) + (0)(1/3) = 1/6 + 1/3 = 1/2
$

Therefore:

$P(C=2 | M=3) = (1)(1/3) / (1/2) = (1/3)/(1/2) = 2/3
$

Switching gives probability 2/3 of winning. Staying gives 1/3. Always switch.

Why this is counterintuitive: Many people think "two doors left, so 50-50." They forget that Monty's action is informative — he deliberately avoids opening the car door, so his choice leaks information.



Key Terms

Worked Examples

Example 1: Diagnostic Testing

A disease affects 1% of the population. A test is 95% sensitive (P(positive | disease) = 0.95) and 90% specific (P(negative | no disease) = 0.90). If a person tests positive, what is the probability they actually have the disease?

Solution:

Let D = "has disease", T+ = "tests positive".

Given: P(D) = 0.01, P(T+|D) = 0.95, P(T−|Dᶜ) = 0.90 → P(T+|Dᶜ) = 0.10.

We want P(D|T+).

By Bayes:

$P(D|T+) = P(T+|D) P(D) / [P(T+|D) P(D) + P(T+|Dᶜ) P(Dᶜ)]
        = (0.95)(0.01) / [(0.95)(0.01) + (0.10)(0.99)]
        = 0.0095 / (0.0095 + 0.099)
        = 0.0095 / 0.1085 ≈ 0.0876
$

Only about 8.8%! Despite the test seeming accurate, the disease is so rare that most positives are false positives. This is the "base rate fallacy."


Example 2: Two-Stage Experiment

Urn 1 contains 3 red and 5 blue balls. Urn 2 contains 4 red and 2 blue balls. A fair coin is flipped: if heads, draw from Urn 1; if tails, draw from Urn 2. What is P(red)?

Solution:

Partition B₁ = {draw from Urn 1}, B₂ = {draw from Urn 2}.

P(B₁) = P(B₂) = 1/2.

P(red|B₁) = 3/8, P(red|B₂) = 4/6 = 2/3.

By LOTP:

$P(red) = P(red|B₁) P(B₁) + P(red|B₂) P(B₂)
       = (3/8)(1/2) + (2/3)(1/2)
       = 3/16 + 1/3 = (9 + 16)/48 = 25/48 ≈ 0.521
$

Example 3: Bayes with Three Hypotheses

A factory has three machines producing the same item. Machine A produces 30% of items with 2% defect rate, Machine B produces 45% with 3% defect rate, Machine C produces 25% with 1% defect rate. A randomly selected item is defective. Which machine most likely produced it?

Solution:

Let D = defective. We want P(A|D), P(B|D), P(C|D).

P(A) = 0.30, P(B) = 0.45, P(C) = 0.25. P(D|A) = 0.02, P(D|B) = 0.03, P(D|C) = 0.01.

P(D) = (0.02)(0.30) + (0.03)(0.45) + (0.01)(0.25) = 0.006 + 0.0135 + 0.0025 = 0.022.

P(A|D) = (0.02)(0.30) / 0.022 = 0.006/0.022 ≈ 0.273 P(B|D) = (0.03)(0.45) / 0.022 = 0.0135/0.022 ≈ 0.614 P(C|D) = (0.01)(0.25) / 0.022 = 0.0025/0.022 ≈ 0.114

Machine B is most likely, despite not having the highest defect rate — it produces the largest share of items.


Quiz

Q1: Which of the following best describes conditional probability P(A|B)?

A) The probability of A and B both occurring B) The probability of A occurring, re-normalized to the subspace where B occurs C) The probability of B occurring given A D) P(A) × P(B)

Correct: B)


Q2: A disease affects 1% of the population. A test is 95% sensitive and 90% specific. If a person tests positive, the probability they actually have the disease is approximately:

A) 95% B) 90% C) 50% D) 8.8%

Correct: D)


Q3: The Law of Total Probability states that if {B₁, B₂, ..., Bₙ} form a partition of Ω, then:

A) P(A) = max(P(A|B₁), ..., P(A|Bₙ)) B) P(A) = Σ P(A|Bᵢ) P(Bᵢ) C) P(A) = P(A|B₁) + P(A|B₂) + ... + P(A|Bₙ) D) P(A) = Π P(A|Bᵢ)

Correct: B)


Q4: In the Monty Hall problem, why does switching doors give a 2/3 chance of winning?

A) There are two doors left, so it's 50-50 B) The car is more likely behind door 2 C) Monty's constrained choice reveals information: he must avoid the car and your initial pick D) The probability resets after Monty opens a door

Correct: C)


Q5: If P(B) = 0, what is P(A|B)?

A) 0 B) 1 C) Undefined D) P(A)

Correct: C)


Q6: In Bayes' theorem, what term does the denominator P(A) represent?

A) Prior belief B) Likelihood of the evidence C) The evidence (marginal probability) D) Posterior probability

Correct: C)


Q7: The chain rule P(A₁ ∩ A₂ ∩ A₃) equals:

A) P(A₁)P(A₂)P(A₃) B) P(A₁)P(A₂|A₁)P(A₃|A₁ ∩ A₂) C) P(A₁|A₂)P(A₂|A₃)P(A₃) D) P(A₁ ∪ A₂ ∪ A₃)

Correct: B)


Practice Problems

  1. If P(A) = 0.4, P(B) = 0.3, P(A ∩ B) = 0.15, find P(A|B), P(B|A), P(A|Bᶜ), and P(Aᶜ|Bᶜ).

  2. Two cards are drawn without replacement from a 52-card deck. Find: (a) P(both aces), (b) P(second is an ace | first is an ace), (c) P(at least one ace).

  3. A family has two children. Given that at least one is a boy, what is the probability both are boys? (Assume P(boy) = 1/2 and independence.)

  4. Prove Bayes' theorem for the simple two-event case: P(A|B) = P(B|A) P(A) / P(B).

  5. 60% of emails are spam. A spam filter correctly flags 98% of spam but incorrectly flags 5% of legitimate emails. If an email is flagged, what is the probability it is actually spam?

  6. A box has 5 good and 3 defective items. Two items are drawn without replacement. Use the chain rule to find P(both are good).

  7. Show that for a partition {B₁, B₂, B₃} of Ω, P(A) = P(A|B₁)P(B₁) + P(A|B₂)P(B₂) + P(A|B₃)P(B₃). Then verify that Σ_{i} P(Bᵢ|A) = 1.

Answers 1. P(A|B) = 0.15/0.3 = 0.5; P(B|A) = 0.15/0.4 = 0.375; P(A|Bᶜ) = P(A ∩ Bᶜ)/P(Bᶜ) = (0.4−0.15)/0.7 = 0.25/0.7 ≈ 0.357; P(Aᶜ|Bᶜ) = P(Aᶜ ∩ Bᶜ)/P(Bᶜ) where P(Aᶜ ∩ Bᶜ) = 1 − P(A ∪ B) = 1 − (0.4+0.3−0.15) = 0.45, so 0.45/0.7 ≈ 0.643. 2. (a) P(both aces) = (4/52)(3/51) = 1/221. (b) P(2nd ace | 1st ace) = 3/51 = 1/17. (c) P(at least one ace) = 1 − P(no aces) = 1 − (48/52)(47/51) = 1 − 188/221 = 33/221. 3. Sample space: {BB, BG, GB, GG}, each 1/4. Given at least one boy: {BB, BG, GB}. Among these, P(BB) = (1/4)/(3/4) = 1/3. The answer is 1/3, not 1/2. 4. P(A|B) = P(A ∩ B)/P(B) = [P(B|A) P(A)]/P(B). Done. 5. S = spam, F = flagged. P(S) = 0.6, P(F|S) = 0.98, P(F|Sᶜ) = 0.05. P(S|F) = (0.98)(0.6) / [(0.98)(0.6) + (0.05)(0.4)] = 0.588/0.608 ≈ 0.967. 6. Let Gᵢ = "i-th item is good". P(G₁ ∩ G₂) = P(G₁) P(G₂|G₁) = (5/8)(4/7) = 20/56 = 5/14. 7. As in the core content proof for the general case. For verification: Σᵢ P(Bᵢ|A) = Σᵢ [P(A|Bᵢ)P(Bᵢ)/P(A)] = [Σᵢ P(A|Bᵢ)P(Bᵢ)] / P(A) = P(A)/P(A) = 1.

Summary


Pitfalls


Quiz

  1. If P(B) = 0, what is P(A|B)? a) 0 b) 1 c) Undefined d) P(A) Answer: c. Division by zero is undefined. You cannot condition on an event of probability zero within elementary probability.

  2. For any A with P(A) > 0, P(A|A) equals: a) P(A)² b) 1 c) P(A) d) 0 Answer: b. P(A|A) = P(A ∩ A)/P(A) = P(A)/P(A) = 1.

  3. If P(A|B) > P(A), what can we conclude? a) P(B|A) > P(B) b) A and B are independent c) P(B) > P(A) d) A and B are mutually exclusive Answer: a. P(A|B) > P(A) ⇔ P(A ∩ B)/P(B) > P(A) ⇔ P(A ∩ B) > P(A)P(B) ⇔ P(B|A) = P(A ∩ B)/P(A) > P(B). This is a symmetric relationship.

  4. In Bayes' theorem, the denominator P(A) is called the: a) Prior b) Likelihood c) Evidence d) Posterior Answer: c. P(A) acts as a normalizing constant, also called the marginal likelihood or evidence.

  5. Two fair coins are flipped. Given that at least one is heads, P(both heads) = ? a) 1/4 b) 1/3 c) 1/2 d) 2/3 Answer: b. Ω = {HH, HT, TH, TT}. "At least one heads" = {HH, HT, TH} (3 outcomes). Among these, 1 is HH. P = 1/3.

  6. In the Monty Hall problem, why is the probability of winning by switching 2/3 and not 1/2? a) Because there are more goats than cars b) Because Monty's choice of door to open is constrained by the car's location and provides information c) Because the problem is symmetrical d) Because the car is placed after you pick Answer: b. Monty cannot open the car door or your initially chosen door, so his action reveals information about the car's location.

  7. A test is 99% accurate for a disease that affects 1 in 1000 people. Probability you have the disease given a positive test is approximately: a) 99% b) 50% c) 9% d) 1% Answer: c. P(D|+) = (0.99)(0.001) / [(0.99)(0.001) + (0.01)(0.999)] ≈ 0.00099/0.01098 ≈ 0.09. Despite the high accuracy, the rarity of the disease makes most positives false.

  8. The Law of Total Probability requires the conditioning events to: a) Be independent b) Form a partition of Ω c) Have equal probability d) Be mutually exclusive but need not be exhaustive Answer: b. They must be pairwise disjoint AND exhaustive (union = Ω) with positive probability.


Next Steps

Continue to 10-03 Independence to learn about the definition of independent events, pairwise vs. mutual independence, and conditional independence.