Math graphic
📐 Concept diagram

Phase 10: Probability Theory

Subject 10-09: More Continuous Distributions

Prerequisites: 10-07 (Continuous Random Variables), 10-08 (Normal Distribution), basic integration, Gamma function familiarity


Learning Objectives

  1. State the PDF, mean, and variance of the Gamma distribution and its special cases (exponential, chi-squared)
  2. Define the chi-squared distribution as a sum of squared standard normals and relate degrees of freedom to the number of summed terms
  3. Describe Student's t-distribution, its derivation from normal/chi-squared, and its convergence to the standard normal as df → ∞
  4. Apply the Beta distribution to model proportions and probabilities on [0, 1]
  5. Explain why the Cauchy distribution has no finite mean or variance and recognize its pathological properties

Core Content

1. Gamma Distribution

The Gamma distribution generalizes the exponential. It has two parameters: shape α > 0 and rate β > 0 (alternatively, scale θ = 1/β).

Notation: X ~ Gamma(α, β) — using rate parameterization

PDF:

$f(x) = (β^α / Γ(α)) · x^{α−1} e^{−βx},    x > 0
$

where Γ(α) = ∫₀^{∞} t^{α−1} e^{−t} dt is the Gamma function, with Γ(n) = (n−1)! for integer n.

Mean and Variance:

$E[X] = α/β
Var(X) = α/β²
$

MGF: M(t) = (1 − t/β)^{−α} for t < β.

Additivity: If X₁ ~ Gamma(α₁, β) and X₂ ~ Gamma(α₂, β) are independent (same rate!), then X₁ + X₂ ~ Gamma(α₁ + α₂, β).

Shape characteristics: - α < 1: PDF goes to ∞ as x → 0⁺ (decreasing) - α = 1: Exponential(β) - α > 1: Unimodal, mode at (α−1)/β - As α → ∞: Approaches normal (by CLT)

Special cases: - Exponential(λ) = Gamma(1, λ) - Chi-squared(k) = Gamma(k/2, 1/2) - Waiting time for α events in Poisson process of rate β: ~ Gamma(α, β)

2. Chi-Squared Distribution

If Z₁, Z₂, ..., Z_k are i.i.d. N(0, 1), then:

$χ²_k = Σ_{i=1}^{k} Z_i² ~ χ²(k)
$

Notation: X ~ χ²(k) where k = degrees of freedom

Relation to Gamma: χ²(k) = Gamma(k/2, 1/2)

PDF:

$f(x) = (1 / (2^{k/2} Γ(k/2))) · x^{k/2 − 1} e^{−x/2},    x > 0
$

Mean and Variance:

$E[χ²_k] = k
Var(χ²_k) = 2k
$

Key properties: - If X₁ ~ χ²(k₁) and X₂ ~ χ²(k₂) are independent, X₁ + X₂ ~ χ²(k₁ + k₂) - For normal samples: (n−1)S²/σ² ~ χ²(n−1), where S² is the sample variance (this is foundational for confidence intervals on σ²) - As k → ∞, χ²(k) approximates N(k, 2k)

Applications: Goodness-of-fit tests, tests of independence, variance estimation.

3. Student's t-Distribution

If Z ~ N(0, 1) and V ~ χ²(ν) are independent, then:

$T = Z / √(V/ν) ~ t(ν)
$

where ν = degrees of freedom.

PDF:

$f(t) = (Γ((ν+1)/2) / (√(νπ) Γ(ν/2))) · (1 + t²/ν)^{−(ν+1)/2},    −∞ < t < ∞
$

Mean and Variance: - E[T] = 0 for ν > 1 (undefined for ν = 1) - Var(T) = ν/(ν−2) for ν > 2 (infinite for ν = 1, 2)

Key properties: - Symmetric about 0, bell-shaped but with heavier tails than normal - As ν → ∞, t(ν) → N(0, 1) — the t converges to normal - t(1) = Cauchy(0, 1) (the Cauchy distribution) - Heavier tails mean more extreme values are more likely than under normality

Applications: The t-distribution is used for inference about the mean when σ is unknown:

$(X̄ − μ) / (S/√n) ~ t(n−1)
$

This is the basis of one-sample and two-sample t-tests and confidence intervals.

Critical values: t_{α, ν} denotes the upper α critical value. As ν increases, t-critical values approach Z-critical values. For ν = 30, t_{0.025} ≈ 2.042 vs. Z_{0.025} = 1.96 — the difference is modest but real.

4. Beta Distribution

The Beta distribution models random variables restricted to [0, 1], making it natural for proportions, probabilities, and Bayesian priors.

Notation: X ~ Beta(α, β) with shape parameters α > 0, β > 0

PDF:

$f(x) = (1 / B(α, β)) · x^{α−1} (1−x)^{β−1},    0 < x < 1
$

where B(α, β) = Γ(α)Γ(β)/Γ(α+β) is the Beta function.

Mean and Variance:

$E[X] = α / (α + β)
Var(X) = αβ / ((α+β)²(α+β+1))
$

Shape characteristics: - α = β = 1: Uniform(0, 1) - α, β > 1: Unimodal, mode at (α−1)/(α+β−2) - α < 1, β < 1: U-shaped (bimodal at 0 and 1) - α = β: Symmetric about 0.5 - As α, β increase, the distribution concentrates around the mean

Bayesian interpretation: Beta is the conjugate prior for the binomial. If prior is Beta(α, β) and we observe k successes in n trials, posterior is Beta(α+k, β+n−k).

Relationship to Gamma: If X ~ Gamma(α, θ) and Y ~ Gamma(β, θ) are independent, then X/(X+Y) ~ Beta(α, β).

5. Cauchy Distribution

The Cauchy is a pathological distribution with no finite mean or variance, but it arises naturally as the ratio of two independent standard normals.

Notation: X ~ Cauchy(x₀, γ) where x₀ = location, γ = scale

Standard Cauchy: Cauchy(0, 1)

PDF:

$f(x) = 1 / (π(1 + x²)),    −∞ < x < ∞
$

CDF:

$F(x) = 1/2 + (1/π) arctan(x)
$

Key properties: - E[|X|] = ∞ (the mean does not exist!) - Var(X) = undefined (infinite) - The sample mean of Cauchy variables does NOT converge to a constant — it remains Cauchy with the same distribution. CLT fails because no finite variance. - The Cauchy is stable: sum of independent Cauchy(0, γ) variables is again Cauchy(0, nγ). That is, X̄ ~ Cauchy — no concentration! - t(1) = Cauchy(0, 1)

Why does the mean not exist?

$E[|X|] = 2 ∫₀^{∞} x/(π(1+x²)) dx = (1/π) [ln(1+x²)]₀^{∞} = ∞
$

The integral diverges logarithmically. Both the positive and negative parts have infinite expectation.

Pathology: If you sample 1000 Cauchy(0,1) values and compute the mean, you might get something like −452.7, then 27.3, then 853.1 — it never settles. The Cauchy is a sobering reminder that "mean" and "variance" are not guaranteed to exist.



Key Terms

Worked Examples

Example 1: Gamma Distribution — Waiting Times

Calls arrive as a Poisson process with rate λ = 3 per hour. Find: (a) the probability that the 5th call takes more than 2 hours, (b) the expected time until the 5th call.

Solution:

The waiting time for the 5th event is T ~ Gamma(α=5, β=3).

(a) P(T > 2) = 1 − P(T ≤ 2). The CDF of Gamma(5, 3) is related to the Poisson: P(T ≤ t) = P(N(t) ≥ 5) = 1 − Σ_{k=0}^{4} (3t)ᵏ e^{−3t}/k!. At t = 2: 3t = 6. Σ_{k=0}^{4} 6ᵏ e^{−6}/k! = e^{−6}[1 + 6 + 18 + 36 + 54] = e^{−6}·115 ≈ 0.285. So P(T > 2) = 1 − 0.285 = 0.715. (Equivalently: P(N(2) ≤ 4) = 0.285, so P(T > 2) = 0.715.)

(b) E[T] = α/β = 5/3 ≈ 1.667 hours.


Example 2: Chi-Squared and Sample Variance

Sample variance S² is computed from n=10 observations from N(μ, σ²). Find P(S² > 2σ²).

Solution:

(n−1)S²/σ² ~ χ²(9). So: P(S² > 2σ²) = P(9·S²/σ² > 18) = P(χ²₉ > 18).

From chi-squared tables, the critical value is between p=0.05 (16.92) and p=0.025 (19.02). Linear interpolation gives approximate p-value ≈ 0.035. So about 3.5% chance.


Example 3: Beta as a Conjugate Prior

Your prior belief about a coin's bias p is Beta(2, 2) — symmetric around 0.5. You flip it 10 times and get 8 heads. What is your posterior distribution for p?

Solution:

Prior: Beta(α=2, β=2). Data: 8 heads, 2 tails. Posterior: Beta(α+k, β+n−k) = Beta(2+8, 2+2) = Beta(10, 4).

Posterior mean: E[p | data] = 10/(10+4) = 10/14 ≈ 0.714. Posterior variance: (10·4)/(14²·15) = 40/(196·15) ≈ 0.0136.

Prior mean was 0.5; after observing 8/10 heads, our estimate shifts to ~0.714, with substantially reduced uncertainty.


Quiz

Q1: The Gamma(α, β) distribution with α = 1 is equivalent to:

A) Chi-squared(1) B) Exponential(β) C) Normal(1, β) D) Uniform(0, β)

Correct: B)


Q2: If Z₁, Z₂, ..., Zₖ are i.i.d. N(0, 1), then Σ Zᵢ² follows:

A) Chi-squared(k) = Gamma(k/2, 1/2) B) Normal(0, k) C) Exponential(k/2) D) Gamma(k/2, k/2)

Correct: A)


Q3: For X ~ χ²(k), E[X] and Var(X) are:

A) E[X] = k/2, Var(X) = k/4 B) E[X] = k, Var(X) = 2k C) E[X] = k, Var(X) = k D) E[X] = 2k, Var(X) = k

Correct: B)


Q4: Student's t-distribution with ν degrees of freedom converges to which distribution as ν → ∞?

A) Chi-squared(ν) B) N(0, 1) C) Cauchy D) Exponential

Correct: B)


Q5: The Beta(α, β) distribution is defined on which interval?

A) (−∞, ∞) B) [0, ∞) C) [0, 1] D) [−1, 1]

Correct: C)


Q6: The Cauchy distribution is notable because:

A) It has the memoryless property B) It has no finite mean or variance C) It is the only symmetric continuous distribution D) It converges to normal for large samples

Correct: B)


Q7: If X ~ Gamma(α₁, β) and Y ~ Gamma(α₂, β) are independent with the SAME rate β, then X + Y ~:

A) Gamma(α₁α₂, β) B) Gamma(α₁ + α₂, β) C) Gamma(α₁, 2β) D) Not Gamma

Correct: B)


Practice Problems

  1. Let X ~ Gamma(3, 2). Find the PDF, E[X], Var(X), and the mode.

  2. If Z₁, Z₂, Z₃ are i.i.d. N(0, 1), find the probability that their sum of squares exceeds 7.815.

  3. Let T ~ t(10). Find P(T > 2.228) and P(|T| > 1.812).

  4. Show that Beta(1, 1) = Uniform(0, 1) by writing out the PDF.

  5. For X ~ Beta(5, 3), find E[X], Var(X), and the mode.

  6. Prove that the mean of the Cauchy distribution does not exist by showing ∫₀^{∞} x f(x) dx diverges.

  7. If X ~ Gamma(α, β), use the MGF to verify that E[X] = α/β and Var(X) = α/β².

Answers 1. PDF: f(x) = (2³/Γ(3)) x² e^{−2x} = 4x² e^{−2x} for x>0 (since Γ(3)=2! =2). E[X]=3/2=1.5, Var(X)=3/4=0.75. Mode = (3−1)/2 = 1. 2. Σ Z_i² ~ χ²(3). P(χ²₃ > 7.815) = 0.05 (the 5% critical value for χ²₃). 3. t(10): P(T > 2.228) = 0.025 (upper 2.5% critical value). P(|T| > 1.812) = 0.10 (the 10% two-sided critical value is 1.812). 4. Beta(1,1): f(x) = (1/B(1,1)) x⁰ (1−x)⁰ = (1/Γ(1)Γ(1)/Γ(2)) · 1 · 1 = (1/(1·1/1)) = 1 for 0 --- ### Summary - Gamma(α, β) has PDF proportional to x^{α−1} e^{−βx}, generalizes exponential (α=1), and models waiting times for α events in a Poisson process; mean = α/β, variance = α/β² - Chi-squared(k) is the sum of k squared standard normals — equivalent to Gamma(k/2, 1/2) — and is fundamental to inference on variance - Student's t(ν) is the ratio of a standard normal to the root of an independent chi-squared/ν; it has heavier tails than the normal but converges to N(0,1) as ν → ∞; essential for inference when σ is unknown - Beta(α, β) on [0,1] is the conjugate prior for the binomial, models proportions, and includes Uniform(0,1) as a special case - Cauchy distribution has PDF 1/(π(1+x²)), infinite mean, and undefined variance — the sample mean of Cauchy variables does not converge, making it a powerful counterexample --- ### Pitfalls - **Confusing the Gamma rate parameter β with the scale parameter θ = 1/β.** Some texts parameterize Gamma(α, θ) with scale θ, where E[X] = αθ and Var(X) = αθ². Always check which parameterization is being used. Rate β gives mean α/β; scale θ = 1/β gives mean αθ. - **Thinking all continuous distributions have finite means.** The Cauchy is the canonical counterexample: ∫ |x| f(x) dx diverges logarithmically. The mean simply does not exist. Before computing E[X] for an unfamiliar distribution, always verify absolute convergence. - **Forgetting that χ²(k) has mean k and variance 2k.** Students often misremember the variance as k (like Poisson). The factor of 2 comes from the Gamma(k/2, 1/2) parameterization: Var = (k/2)/(1/2)² = 2k. - **Confusing Student's t degrees of freedom with sample size.** In the one-sample t-test, T = (X̄ − μ)/(S/√n) ~ t(n−1), NOT t(n). The degrees of freedom are n−1 because one degree is "used" to estimate the mean. Using ν = n instead of n−1 inflates the effective sample size and underestimates tail probabilities. - **Assuming the t-distribution quickly becomes "close enough" to normal for all purposes.** At ν = 30, the 0.975 quantile of t is 2.042 vs. normal's 1.96 — a 4% difference. For tail probabilities (e.g., 0.995 quantile), the difference persists much longer. When ν < 10, the heavier tails of t are practically significant. --- ### Quiz 1. The chi-squared distribution with k degrees of freedom is equivalent to: a) Gamma(k, 1) b) Gamma(k/2, 1/2) c) Normal(0, k) d) Exponential(1/k) **Answer: b.** χ²(k) = Gamma(k/2, 1/2). 2. If T ~ t(ν), as ν → ∞, T converges in distribution to: a) Cauchy(0, 1) b) N(0, 1) c) χ²(ν) d) t remains t-distributed **Answer: b.** As degrees of freedom increase, the t-distribution approaches the standard normal. 3. Which distribution has NO finite mean? a) t(2) b) t(3) c) Cauchy d) Exponential **Answer: c.** The Cauchy has E[|X|] = ∞. t(ν) has finite mean for ν > 1, and t(1) = Cauchy. 4. Beta(1, 1) is equivalent to: a) Uniform(0, 1) b) Exponential(1) c) N(0.5, 1/12) d) Gamma(1, 1) **Answer: a.** With α=β=1, f(x) ∝ x⁰(1−x)⁰ = 1 — constant on [0,1], which is Uniform(0,1). 5. The sum of k independent squared standard normals follows: a) N(0, k) b) χ²(k) c) Gamma(k, 1/2) d) Both b and c **Answer: d.** χ²(k) = Gamma(k/2, 1/2). Both descriptions are equivalent. 6. For X ~ Gamma(α, β), the mode (for α > 1) is: a) α/β b) (α−1)/β c) αβ d) α/β² **Answer: b.** Setting f'(x) = 0 gives mode = (α−1)/β for α > 1. For α ≤ 1, the mode is 0. 7. The Beta distribution is the conjugate prior for which sampling distribution? a) Poisson b) Normal c) Binomial d) Exponential **Answer: c.** If the likelihood is Binomial(n, p) and prior is Beta(α, β), the posterior is Beta(α+k, β+n−k). 8. Why does the sample mean of Cauchy random variables not converge? a) The data are discrete b) The Cauchy has no finite variance, so the CLT does not apply c) The Cauchy is skewed d) The sample mean is always 0 **Answer: b.** The CLT requires finite variance. The Cauchy is a sum of independent Cauchy variables that stays Cauchy — no concentration occurs. --- ### Next Steps Continue to **10-10 Joint Distributions** to learn about joint PMFs/PDFs, marginal and conditional distributions, and independence for jointly distributed random variables.