Math graphic
📐 Concept diagram

Phase 10: Probability Theory

Subject 10-08: Normal (Gaussian) Distribution

Prerequisites: 10-07 (Continuous Random Variables), basic integration, moment generating functions


Learning Objectives

  1. State the PDF of the standard normal distribution and explain why it integrates to 1
  2. Transform between standard normal Z ~ N(0,1) and general normal X ~ N(μ, σ²) via Z = (X − μ)/σ
  3. Apply Z-scores and the 68-95-99.7 rule for rapid probability approximation
  4. Derive and use the moment generating function M(t) = exp(μt + σ²t²/2) to compute moments
  5. Prove that the sum of independent normal random variables is normal with summed means and variances

Core Content

1. The Standard Normal Distribution

The standard normal (or standard Gaussian) distribution has mean 0 and variance 1.

Notation: Z ~ N(0, 1)

PDF:

$φ(z) = (1/√(2π)) · e^{−z²/2},    −∞ < z < ∞
$

Why 1/√(2π)? This is the normalizing constant. To verify:

Let I = ∫_{−∞}^{∞} e^{−x²/2} dx. Then I² = ∫∫ e^{−(x²+y²)/2} dx dy. Switch to polar coordinates: x = r cos θ, y = r sin θ, dx dy = r dr dθ.

$I² = ∫₀^{2π} ∫₀^{∞} e^{−r²/2} r dr dθ = 2π ∫₀^{∞} e^{−r²/2} r dr
$

Let u = r²/2, du = r dr: I² = 2π ∫₀^{∞} e^{−u} du = 2π · 1 = 2π. So I = √(2π).

Therefore ∫ φ(z) dz = (1/√(2π)) · √(2π) = 1. ✓

CDF:

$Φ(z) = P(Z ≤ z) = ∫_{−∞}^{z} φ(t) dt
$

Φ(z) has no closed form — it must be evaluated numerically or looked up in tables.

Key properties of φ(z): - Symmetric about 0: φ(−z) = φ(z) - Bell-shaped, unimodal at z = 0 - Inflection points at z = ±1 - Maximum value: φ(0) = 1/√(2π) ≈ 0.3989

Key properties of Φ(z): - Φ(−z) = 1 − Φ(z) (by symmetry) - Φ(0) = 1/2 - P(a < Z < b) = Φ(b) − Φ(a) - P(|Z| < z) = 2Φ(z) − 1

2. General Normal Distribution

If Z ~ N(0, 1), then the linear transformation X = μ + σZ yields:

Notation: X ~ N(μ, σ²)

PDF:

$f(x) = (1/√(2πσ²)) · exp(−(x − μ)²/(2σ²)),    −∞ < x < ∞
$

Parameters: - μ = E[X] (location parameter — where the bell is centered) - σ² = Var(X) (scale parameter — how spread out it is) - σ = standard deviation

Standardization (Z-score): If X ~ N(μ, σ²), then:

$Z = (X − μ)/σ ~ N(0, 1)
$

This lets us convert any normal probability to a standard normal one:

$P(X ≤ x) = P((X − μ)/σ ≤ (x − μ)/σ) = Φ((x − μ)/σ)
$

Z-score interpretation: (x − μ)/σ tells you how many standard deviations x is from the mean.

3. The 68-95-99.7 Rule (Empirical Rule)

For X ~ N(μ, σ²):

In terms of Z: - P(−1 < Z < 1) = Φ(1) − Φ(−1) = 2Φ(1) − 1 ≈ 2(0.8413) − 1 = 0.6826 - P(−2 < Z < 2) = 2Φ(2) − 1 ≈ 2(0.9772) − 1 = 0.9544 - P(−3 < Z < 3) = 2Φ(3) − 1 ≈ 2(0.9987) − 1 = 0.9974

Other useful Z-values: - P(Z < 1.645) ≈ 0.95 (95th percentile) - P(Z < 1.96) ≈ 0.975 - P(Z < 2.576) ≈ 0.995

4. Moment Generating Function (MGF)

Standard normal: Z ~ N(0, 1)

$M_Z(t) = E[e^{tZ}] = ∫_{−∞}^{∞} e^{tz} (1/√(2π)) e^{−z²/2} dz
$

Complete the square: tz − z²/2 = −(z² − 2tz)/2 = −[(z − t)² − t²]/2 = −(z − t)²/2 + t²/2.

M_Z(t) = e^{t²/2} ∫_{−∞}^{∞} (1/√(2π)) e^{−(z − t)²/2} dz = e^{t²/2} · 1 = e^{t²/2}

General normal: X ~ N(μ, σ²), where X = μ + σZ.

$M_X(t) = E[e^{t(μ + σZ)}] = e^{μt} E[e^{(tσ)Z}] = e^{μt} M_Z(tσ) = e^{μt} e^{σ²t²/2} = exp(μt + σ²t²/2)
$

Computing moments from MGF: - M'_X(0) = E[X] = μ - M''_X(0) = E[X²] = μ² + σ² - Var(X) = E[X²] − (E[X])² = σ²

5. Sum of Independent Normals

Theorem: If X₁ ~ N(μ₁, σ₁²) and X₂ ~ N(μ₂, σ₂²) are independent, then:

$X₁ + X₂ ~ N(μ₁ + μ₂, σ₁² + σ₂²)
$

Proof via MGF: By independence, M_{X₁+X₂}(t) = M_{X₁}(t) M_{X₂}(t) = exp(μ₁t + σ₁²t²/2) · exp(μ₂t + σ₂²t²/2) = exp((μ₁+μ₂)t + (σ₁²+σ₂²)t²/2). This is the MGF of N(μ₁+μ₂, σ₁²+σ₂²). Since MGFs uniquely determine distributions, the result follows.

Corollary: For independent Xᵢ ~ N(μᵢ, σᵢ²), the sum Σ Xᵢ ~ N(Σ μᵢ, Σ σᵢ²). The sample mean of n i.i.d. N(μ, σ²) random variables is X̄ ~ N(μ, σ²/n).

Linearity: aX₁ + bX₂ ~ N(aμ₁ + bμ₂, a²σ₁² + b²σ₂²). Any linear combination of independent normals is normal.

Edge case: If X and Y are JOINTLY normal with correlation ρ, then X+Y ~ N(μ_X+μ_Y, σ_X²+σ_Y²+2ρσ_Xσ_Y). Still normal! This property is special — the sum of normals is normal even when they are dependent, provided they are jointly normal.



Key Terms

Worked Examples

Example 1: Standard Normal Probabilities

Let Z ~ N(0, 1). Find: (a) P(Z < 1.5), (b) P(Z > −0.8), (c) P(−2 < Z < 1), (d) the value c such that P(Z > c) = 0.05.

Solution:

(a) P(Z < 1.5) = Φ(1.5) ≈ 0.9332

(b) P(Z > −0.8) = 1 − Φ(−0.8) = Φ(0.8) ≈ 0.7881 (by symmetry)

(c) P(−2 < Z < 1) = Φ(1) − Φ(−2) = Φ(1) − (1−Φ(2)) = 0.8413 − (1−0.9772) = 0.8413 − 0.0228 = 0.8185

(d) P(Z > c) = 0.05 → Φ(c) = 0.95 → c ≈ 1.645 (the 95th percentile)


Example 2: General Normal Application

IQ scores are N(100, 15²). What proportion of the population has: (a) IQ > 130? (b) IQ between 85 and 115? (c) What IQ is the 98th percentile?

Solution:

(a) Z = (130−100)/15 = 2. P(Z > 2) = 1 − Φ(2) ≈ 1 − 0.9772 = 0.0228. About 2.28%.

(b) Z₁ = (85−100)/15 = −1, Z₂ = (115−100)/15 = 1. P(−1 < Z < 1) ≈ 0.6827. About 68.3%.

(c) Φ(z) = 0.98 → z ≈ 2.054. IQ = 100 + 2.054(15) ≈ 130.8.


Example 3: Sum of Independent Normals

Weights of apples from orchard A: N(150g, 20²). From orchard B: N(180g, 25²). If you take one apple from each (independent), what is P(total > 350g)?

Solution:

Total T = X_A + X_B ~ N(150+180, 20²+25²) = N(330, 1025). σ_T = √1025 ≈ 32.02.

Z = (350 − 330)/32.02 ≈ 0.625. P(T > 350) = 1 − Φ(0.625) ≈ 1 − 0.734 = 0.266. About 26.6%.


Quiz

Q1: The standard normal PDF φ(z) includes the factor 1/√(2π). This factor ensures:

A) The function is differentiable B) The total area under the curve equals 1 C) The peak height equals 1 D) The function is symmetric about zero

Correct: B)


Q2: If X ~ N(μ, σ²), the Z-score Z = (X − μ)/σ follows:

A) N(0, 1) B) N(μ, σ²) C) N(0, σ²) D) N(μ, 1)

Correct: A)


Q3: The 68-95-99.7 rule for N(μ, σ²) states that approximately 95% of observations fall within:

A) μ ± σ B) μ ± 2σ C) μ ± 3σ D) μ ± 1.96σ

Correct: B)


Q4: If X₁ ~ N(μ₁, σ₁²) and X₂ ~ N(μ₂, σ₂²) are independent, then X₁ + X₂ ~:

A) N(μ₁ + μ₂, σ₁² + σ₂²) B) N(μ₁μ₂, σ₁²σ₂²) C) N(μ₁ + μ₂, σ₁σ₂) D) Approximately normal by CLT

Correct: A)


Q5: By symmetry of the standard normal, Φ(−z) equals:

A) Φ(z) B) 1 − Φ(z) C) −Φ(z) D) 2Φ(z) − 1

Correct: B)


Q6: The MGF of X ~ N(μ, σ²) is M(t) = exp(μt + σ²t²/2). M'(0) equals:

A) μ B) σ² C) μ + σ² D) 0

Correct: A)


Practice Problems

  1. If Z ~ N(0, 1), find P(Z < −1.96), P(|Z| > 2.33), and the interquartile range (IQR: distance between 25th and 75th percentiles).

  2. Heights are N(170cm, 10²). Find the proportion between 165cm and 180cm. Find the height that only 5% exceed.

  3. Derive E[Z²] for Z ~ N(0,1) using the MGF: verify M''_Z(0) = 1.

  4. If X₁ ~ N(10, 4) and X₂ ~ N(15, 9) are independent, find the distribution of Y = 2X₁ − X₂ + 5.

  5. The MGF of X is M(t) = exp(3t + 8t²). Find E[X] and Var(X).

  6. Show that the mode of N(μ, σ²) is at x = μ by finding where f'(x) = 0.

  7. If 10 independent measurements Xᵢ ~ N(μ, 4), what is P(|X̄ − μ| < 0.5)?

Answers 1. P(Z < −1.96) = 0.025. P(|Z| > 2.33) = 2(1−Φ(2.33)) ≈ 2(0.0099) = 0.0198. IQR: Φ(z₀.₂₅) = 0.25 → z = −0.674; Φ(z₀.₇₅) = 0.75 → z = 0.674. IQR = 0.674 − (−0.674) = 1.348 in Z-units. 2. Z₁ = (165−170)/10 = −0.5, Z₂ = (180−170)/10 = 1.0. P = Φ(1)−Φ(−0.5) = 0.8413−0.3085 = 0.5328. 95th percentile: z=1.645, height = 170+1.645·10 = 186.45 cm. 3. M_Z(t) = e^{t²/2}. M'_Z(t) = t e^{t²/2}, so M'_Z(0) = 0. M''_Z(t) = e^{t²/2} + t²e^{t²/2}, so M''_Z(0) = 1. E[Z²] = 1, Var(Z) = 1−0 = 1. ✓ 4. Y = 2X₁ − X₂ + 5. E[Y] = 2(10) − 15 + 5 = 10. Var(Y) = 4(4) + 1(9) = 25. Y ~ N(10, 25). 5. M(t) = exp(3t + 8t²). Comparing to exp(μt + σ²t²/2): μ = 3, σ²/2 = 8 → σ² = 16. So E[X] = 3, Var(X) = 16, X ~ N(3, 16). 6. f(x) = (1/√(2πσ²)) exp(−(x−μ)²/(2σ²)). f'(x) = f(x) · [−(x−μ)/σ²] = 0 only at x = μ. Second derivative confirms it's a maximum. 7. X̄ ~ N(μ, 4/10) = N(μ, 0.4). σ_X̄ = √0.4 ≈ 0.6325. P(|X̄−μ| < 0.5) = P(|Z| < 0.5/0.6325) = P(|Z| < 0.79) = 2Φ(0.79)−1 ≈ 2(0.7852)−1 = 0.5704.

Summary


Pitfalls


Quiz

  1. If X ~ N(100, 15²), the Z-score for X = 130 is: a) 1.0 b) 1.5 c) 2.0 d) 3.0 Answer: c. Z = (130−100)/15 = 30/15 = 2.0.

  2. For Z ~ N(0, 1), P(−1 < Z < 1) is approximately: a) 50% b) 68% c) 95% d) 99.7% Answer: b. The 68-95-99.7 rule: ~68% within 1σ.

  3. The MGF of a normal distribution uniquely identifies it because: a) All MGFs are unique b) MGFs uniquely determine distributions when they exist in a neighborhood of 0 c) Normals have the simplest MGF d) The normal MGF is always positive Answer: b. MGFs uniquely characterize distributions in general when they converge near 0.

  4. If X₁ ~ N(2, 1) and X₂ ~ N(3, 4) are independent, Var(X₁ + X₂) = ? a) 3 b) 5 c) √5 d) 7 Answer: b. Var(X₁+X₂) = 1 + 4 = 5.

  5. The normal PDF is symmetric about: a) σ b) μ c) σ² d) The y-axis Answer: b. f(μ+x) = f(μ−x) — symmetry about the mean μ.

  6. Φ(−z) equals: a) −Φ(z) b) 1/Φ(z) c) Φ(z) d) 1 − Φ(z) Answer: d. By symmetry of the standard normal: Φ(−z) = 1 − Φ(z).

  7. The sample mean of n i.i.d. N(μ, σ²) observations has variance: a) σ² b) σ²/n c) σ/√n d) nσ² Answer: b. X̄ ~ N(μ, σ²/n). The variance shrinks as sample size increases.

  8. The sum of two independent normal RVs is: a) Always normal b) Normal only if they have equal variances c) Not necessarily normal d) Approximately normal by CLT Answer: a. The normal is closed under convolution — the sum of independent normals is exactly normal with summed parameters.


Next Steps

Continue to 10-09 More Continuous Distributions to learn about the Gamma, chi-squared, Student's t, Beta, and Cauchy distributions.