Phase 10: Probability Theory
Subject 10-09: More Continuous Distributions
Prerequisites: 10-07 (Continuous Random Variables), 10-08 (Normal Distribution), basic integration, Gamma function familiarity
Learning Objectives
- State the PDF, mean, and variance of the Gamma distribution and its special cases (exponential, chi-squared)
- Define the chi-squared distribution as a sum of squared standard normals and relate degrees of freedom to the number of summed terms
- Describe Student's t-distribution, its derivation from normal/chi-squared, and its convergence to the standard normal as df → ∞
- Apply the Beta distribution to model proportions and probabilities on [0, 1]
- Explain why the Cauchy distribution has no finite mean or variance and recognize its pathological properties
Core Content
1. Gamma Distribution
The Gamma distribution generalizes the exponential. It has two parameters: shape α > 0 and rate β > 0 (alternatively, scale θ = 1/β).
Notation: X ~ Gamma(α, β) — using rate parameterization
PDF:
$f(x) = (β^α / Γ(α)) · x^{α−1} e^{−βx}, x > 0
$
where Γ(α) = ∫₀^{∞} t^{α−1} e^{−t} dt is the Gamma function, with Γ(n) = (n−1)! for integer n.
Mean and Variance:
$E[X] = α/β Var(X) = α/β² $
MGF: M(t) = (1 − t/β)^{−α} for t < β.
Additivity: If X₁ ~ Gamma(α₁, β) and X₂ ~ Gamma(α₂, β) are independent (same rate!), then X₁ + X₂ ~ Gamma(α₁ + α₂, β).
Shape characteristics: - α < 1: PDF goes to ∞ as x → 0⁺ (decreasing) - α = 1: Exponential(β) - α > 1: Unimodal, mode at (α−1)/β - As α → ∞: Approaches normal (by CLT)
Special cases: - Exponential(λ) = Gamma(1, λ) - Chi-squared(k) = Gamma(k/2, 1/2) - Waiting time for α events in Poisson process of rate β: ~ Gamma(α, β)
2. Chi-Squared Distribution
If Z₁, Z₂, ..., Z_k are i.i.d. N(0, 1), then:
$χ²_k = Σ_{i=1}^{k} Z_i² ~ χ²(k)
$
Notation: X ~ χ²(k) where k = degrees of freedom
Relation to Gamma: χ²(k) = Gamma(k/2, 1/2)
PDF:
$f(x) = (1 / (2^{k/2} Γ(k/2))) · x^{k/2 − 1} e^{−x/2}, x > 0
$
Mean and Variance:
$E[χ²_k] = k Var(χ²_k) = 2k $
Key properties: - If X₁ ~ χ²(k₁) and X₂ ~ χ²(k₂) are independent, X₁ + X₂ ~ χ²(k₁ + k₂) - For normal samples: (n−1)S²/σ² ~ χ²(n−1), where S² is the sample variance (this is foundational for confidence intervals on σ²) - As k → ∞, χ²(k) approximates N(k, 2k)
Applications: Goodness-of-fit tests, tests of independence, variance estimation.
3. Student's t-Distribution
If Z ~ N(0, 1) and V ~ χ²(ν) are independent, then:
$T = Z / √(V/ν) ~ t(ν) $
where ν = degrees of freedom.
PDF:
$f(t) = (Γ((ν+1)/2) / (√(νπ) Γ(ν/2))) · (1 + t²/ν)^{−(ν+1)/2}, −∞ < t < ∞
$
Mean and Variance: - E[T] = 0 for ν > 1 (undefined for ν = 1) - Var(T) = ν/(ν−2) for ν > 2 (infinite for ν = 1, 2)
Key properties: - Symmetric about 0, bell-shaped but with heavier tails than normal - As ν → ∞, t(ν) → N(0, 1) — the t converges to normal - t(1) = Cauchy(0, 1) (the Cauchy distribution) - Heavier tails mean more extreme values are more likely than under normality
Applications: The t-distribution is used for inference about the mean when σ is unknown:
$(X̄ − μ) / (S/√n) ~ t(n−1) $
This is the basis of one-sample and two-sample t-tests and confidence intervals.
Critical values: t_{α, ν} denotes the upper α critical value. As ν increases, t-critical values approach Z-critical values. For ν = 30, t_{0.025} ≈ 2.042 vs. Z_{0.025} = 1.96 — the difference is modest but real.
4. Beta Distribution
The Beta distribution models random variables restricted to [0, 1], making it natural for proportions, probabilities, and Bayesian priors.
Notation: X ~ Beta(α, β) with shape parameters α > 0, β > 0
PDF:
$f(x) = (1 / B(α, β)) · x^{α−1} (1−x)^{β−1}, 0 < x < 1
$
where B(α, β) = Γ(α)Γ(β)/Γ(α+β) is the Beta function.
Mean and Variance:
$E[X] = α / (α + β) Var(X) = αβ / ((α+β)²(α+β+1)) $
Shape characteristics: - α = β = 1: Uniform(0, 1) - α, β > 1: Unimodal, mode at (α−1)/(α+β−2) - α < 1, β < 1: U-shaped (bimodal at 0 and 1) - α = β: Symmetric about 0.5 - As α, β increase, the distribution concentrates around the mean
Bayesian interpretation: Beta is the conjugate prior for the binomial. If prior is Beta(α, β) and we observe k successes in n trials, posterior is Beta(α+k, β+n−k).
Relationship to Gamma: If X ~ Gamma(α, θ) and Y ~ Gamma(β, θ) are independent, then X/(X+Y) ~ Beta(α, β).
5. Cauchy Distribution
The Cauchy is a pathological distribution with no finite mean or variance, but it arises naturally as the ratio of two independent standard normals.
Notation: X ~ Cauchy(x₀, γ) where x₀ = location, γ = scale
Standard Cauchy: Cauchy(0, 1)
PDF:
$f(x) = 1 / (π(1 + x²)), −∞ < x < ∞ $
CDF:
$F(x) = 1/2 + (1/π) arctan(x) $
Key properties: - E[|X|] = ∞ (the mean does not exist!) - Var(X) = undefined (infinite) - The sample mean of Cauchy variables does NOT converge to a constant — it remains Cauchy with the same distribution. CLT fails because no finite variance. - The Cauchy is stable: sum of independent Cauchy(0, γ) variables is again Cauchy(0, nγ). That is, X̄ ~ Cauchy — no concentration! - t(1) = Cauchy(0, 1)
Why does the mean not exist?
$E[|X|] = 2 ∫₀^{∞} x/(π(1+x²)) dx = (1/π) [ln(1+x²)]₀^{∞} = ∞
$
The integral diverges logarithmically. Both the positive and negative parts have infinite expectation.
Pathology: If you sample 1000 Cauchy(0,1) values and compute the mean, you might get something like −452.7, then 27.3, then 853.1 — it never settles. The Cauchy is a sobering reminder that "mean" and "variance" are not guaranteed to exist.
Key Terms
- 10 09 More Continuous Distributions
- 10-10 Joint Distributions
- Answer: a.
- Answer: b.
- Answer: c.
- Answer: d.
- Beta distribution
- Cauchy
- Subject 10-09: More Continuous Distributions
- Why does the mean not exist?
Worked Examples
Example 1: Gamma Distribution — Waiting Times
Calls arrive as a Poisson process with rate λ = 3 per hour. Find: (a) the probability that the 5th call takes more than 2 hours, (b) the expected time until the 5th call.
Solution:
The waiting time for the 5th event is T ~ Gamma(α=5, β=3).
(a) P(T > 2) = 1 − P(T ≤ 2). The CDF of Gamma(5, 3) is related to the Poisson: P(T ≤ t) = P(N(t) ≥ 5) = 1 − Σ_{k=0}^{4} (3t)ᵏ e^{−3t}/k!. At t = 2: 3t = 6. Σ_{k=0}^{4} 6ᵏ e^{−6}/k! = e^{−6}[1 + 6 + 18 + 36 + 54] = e^{−6}·115 ≈ 0.285. So P(T > 2) = 1 − 0.285 = 0.715. (Equivalently: P(N(2) ≤ 4) = 0.285, so P(T > 2) = 0.715.)
(b) E[T] = α/β = 5/3 ≈ 1.667 hours.
Example 2: Chi-Squared and Sample Variance
Sample variance S² is computed from n=10 observations from N(μ, σ²). Find P(S² > 2σ²).
Solution:
(n−1)S²/σ² ~ χ²(9). So: P(S² > 2σ²) = P(9·S²/σ² > 18) = P(χ²₉ > 18).
From chi-squared tables, the critical value is between p=0.05 (16.92) and p=0.025 (19.02). Linear interpolation gives approximate p-value ≈ 0.035. So about 3.5% chance.
Example 3: Beta as a Conjugate Prior
Your prior belief about a coin's bias p is Beta(2, 2) — symmetric around 0.5. You flip it 10 times and get 8 heads. What is your posterior distribution for p?
Solution:
Prior: Beta(α=2, β=2). Data: 8 heads, 2 tails. Posterior: Beta(α+k, β+n−k) = Beta(2+8, 2+2) = Beta(10, 4).
Posterior mean: E[p | data] = 10/(10+4) = 10/14 ≈ 0.714. Posterior variance: (10·4)/(14²·15) = 40/(196·15) ≈ 0.0136.
Prior mean was 0.5; after observing 8/10 heads, our estimate shifts to ~0.714, with substantially reduced uncertainty.
Quiz
Q1: The Gamma(α, β) distribution with α = 1 is equivalent to:
A) Chi-squared(1) B) Exponential(β) C) Normal(1, β) D) Uniform(0, β)
Correct: B)
- If you chose B: Correct! Gamma(1, β) has PDF βe^{−βx} for x > 0, which is exactly the Exponential(β) distribution.
- If you chose A: Chi-squared(k) = Gamma(k/2, 1/2), so Chi-squared(2) = Gamma(1, 1/2), not Gamma(1, β).
- If you chose C: The Gamma and Normal are completely different families.
- If you chose D: Uniform has finite support and constant PDF; Gamma has infinite support.
Q2: If Z₁, Z₂, ..., Zₖ are i.i.d. N(0, 1), then Σ Zᵢ² follows:
A) Chi-squared(k) = Gamma(k/2, 1/2) B) Normal(0, k) C) Exponential(k/2) D) Gamma(k/2, k/2)
Correct: A)
- If you chose A: Correct! The sum of squared standard normals is chi-squared with k degrees of freedom, equivalent to Gamma(k/2, 1/2).
- If you chose B: The sum Z₁+...+Zₖ ~ N(0, k), but the sum of SQUARES follows chi-squared.
- If you chose C: Only when k = 2 does chi-squared reduce to Exponential(1/2).
- If you chose D: The rate parameter should be 1/2, not k/2.
Q3: For X ~ χ²(k), E[X] and Var(X) are:
A) E[X] = k/2, Var(X) = k/4 B) E[X] = k, Var(X) = 2k C) E[X] = k, Var(X) = k D) E[X] = 2k, Var(X) = k
Correct: B)
- If you chose B: Correct! Using the Gamma(k/2, 1/2) representation: E[X] = (k/2)/(1/2) = k, Var(X) = (k/2)/(1/2)² = 2k.
- If you chose A: These are the mean and variance of Gamma(k/2, 1), not chi-squared.
- If you chose C: This would be Poisson-like, but chi-squared variance is 2k.
- If you chose D: This swaps mean and variance.
Q4: Student's t-distribution with ν degrees of freedom converges to which distribution as ν → ∞?
A) Chi-squared(ν) B) N(0, 1) C) Cauchy D) Exponential
Correct: B)
- If you chose B: Correct! As ν → ∞, the t-distribution approaches the standard normal. For ν ≥ 30, the approximation is already quite good. This is because the chi-squared denominator converges to its mean.
- If you chose A: The t is derived from normal/√(chi-squared/ν), not chi-squared alone.
- If you chose C: The Cauchy is t with ν = 1 — it has much heavier tails and no finite mean.
- If you chose D: The t is symmetric and bell-shaped, not one-sided like exponential.
Q5: The Beta(α, β) distribution is defined on which interval?
A) (−∞, ∞) B) [0, ∞) C) [0, 1] D) [−1, 1]
Correct: C)
- If you chose C: Correct! The Beta distribution models proportions and probabilities, so its support is naturally [0, 1].
- If you chose A: This would be the support of Normal or Cauchy.
- If you chose B: This is the support of Gamma, Exponential, Chi-squared.
- If you chose D: This is the support of correlation coefficient-related distributions.
Q6: The Cauchy distribution is notable because:
A) It has the memoryless property B) It has no finite mean or variance C) It is the only symmetric continuous distribution D) It converges to normal for large samples
Correct: B)
- If you chose B: Correct! The Cauchy PDF f(x) = 1/(π(1+x²)) decays slowly enough that ∫ |x| f(x) dx diverges. Neither mean nor variance exists. The sample mean of Cauchy variables does NOT converge to a constant — it remains Cauchy!
- If you chose A: The memoryless property is unique to Exponential (continuous) and Geometric (discrete).
- If you chose C: Many continuous distributions are symmetric (Normal, Uniform, t, Laplace).
- If you chose D: The Cauchy does NOT satisfy CLT — it's a counterexample because it lacks finite variance.
Q7: If X ~ Gamma(α₁, β) and Y ~ Gamma(α₂, β) are independent with the SAME rate β, then X + Y ~:
A) Gamma(α₁α₂, β) B) Gamma(α₁ + α₂, β) C) Gamma(α₁, 2β) D) Not Gamma
Correct: B)
- If you chose B: Correct! Gamma distributions with the same rate parameter are additive in the shape parameter. This generalizes: sum of α events in a Poisson process of rate β has Gamma(α, β) distribution.
- If you chose A: Shape parameters add, not multiply.
- If you chose C: The rate stays the same; only the shape adds.
- If you chose D: It IS Gamma when rates match.
Practice Problems
-
Let X ~ Gamma(3, 2). Find the PDF, E[X], Var(X), and the mode.
-
If Z₁, Z₂, Z₃ are i.i.d. N(0, 1), find the probability that their sum of squares exceeds 7.815.
-
Let T ~ t(10). Find P(T > 2.228) and P(|T| > 1.812).
-
Show that Beta(1, 1) = Uniform(0, 1) by writing out the PDF.
-
For X ~ Beta(5, 3), find E[X], Var(X), and the mode.
-
Prove that the mean of the Cauchy distribution does not exist by showing ∫₀^{∞} x f(x) dx diverges.
-
If X ~ Gamma(α, β), use the MGF to verify that E[X] = α/β and Var(X) = α/β².