Phase 10: Probability Theory
Subject 10-08: Normal (Gaussian) Distribution
Prerequisites: 10-07 (Continuous Random Variables), basic integration, moment generating functions
Learning Objectives
- State the PDF of the standard normal distribution and explain why it integrates to 1
- Transform between standard normal Z ~ N(0,1) and general normal X ~ N(μ, σ²) via Z = (X − μ)/σ
- Apply Z-scores and the 68-95-99.7 rule for rapid probability approximation
- Derive and use the moment generating function M(t) = exp(μt + σ²t²/2) to compute moments
- Prove that the sum of independent normal random variables is normal with summed means and variances
Core Content
1. The Standard Normal Distribution
The standard normal (or standard Gaussian) distribution has mean 0 and variance 1.
Notation: Z ~ N(0, 1)
PDF:
$φ(z) = (1/√(2π)) · e^{−z²/2}, −∞ < z < ∞
$
Why 1/√(2π)? This is the normalizing constant. To verify:
Let I = ∫_{−∞}^{∞} e^{−x²/2} dx. Then I² = ∫∫ e^{−(x²+y²)/2} dx dy. Switch to polar coordinates: x = r cos θ, y = r sin θ, dx dy = r dr dθ.
$I² = ∫₀^{2π} ∫₀^{∞} e^{−r²/2} r dr dθ = 2π ∫₀^{∞} e^{−r²/2} r dr
$
Let u = r²/2, du = r dr: I² = 2π ∫₀^{∞} e^{−u} du = 2π · 1 = 2π. So I = √(2π).
Therefore ∫ φ(z) dz = (1/√(2π)) · √(2π) = 1. ✓
CDF:
$Φ(z) = P(Z ≤ z) = ∫_{−∞}^{z} φ(t) dt
$
Φ(z) has no closed form — it must be evaluated numerically or looked up in tables.
Key properties of φ(z): - Symmetric about 0: φ(−z) = φ(z) - Bell-shaped, unimodal at z = 0 - Inflection points at z = ±1 - Maximum value: φ(0) = 1/√(2π) ≈ 0.3989
Key properties of Φ(z): - Φ(−z) = 1 − Φ(z) (by symmetry) - Φ(0) = 1/2 - P(a < Z < b) = Φ(b) − Φ(a) - P(|Z| < z) = 2Φ(z) − 1
2. General Normal Distribution
If Z ~ N(0, 1), then the linear transformation X = μ + σZ yields:
Notation: X ~ N(μ, σ²)
PDF:
$f(x) = (1/√(2πσ²)) · exp(−(x − μ)²/(2σ²)), −∞ < x < ∞ $
Parameters: - μ = E[X] (location parameter — where the bell is centered) - σ² = Var(X) (scale parameter — how spread out it is) - σ = standard deviation
Standardization (Z-score): If X ~ N(μ, σ²), then:
$Z = (X − μ)/σ ~ N(0, 1) $
This lets us convert any normal probability to a standard normal one:
$P(X ≤ x) = P((X − μ)/σ ≤ (x − μ)/σ) = Φ((x − μ)/σ) $
Z-score interpretation: (x − μ)/σ tells you how many standard deviations x is from the mean.
3. The 68-95-99.7 Rule (Empirical Rule)
For X ~ N(μ, σ²):
- P(μ − σ < X < μ + σ) ≈ 0.6827 (about 68%)
- P(μ − 2σ < X < μ + 2σ) ≈ 0.9545 (about 95%)
- P(μ − 3σ < X < μ + 3σ) ≈ 0.9973 (about 99.7%)
In terms of Z: - P(−1 < Z < 1) = Φ(1) − Φ(−1) = 2Φ(1) − 1 ≈ 2(0.8413) − 1 = 0.6826 - P(−2 < Z < 2) = 2Φ(2) − 1 ≈ 2(0.9772) − 1 = 0.9544 - P(−3 < Z < 3) = 2Φ(3) − 1 ≈ 2(0.9987) − 1 = 0.9974
Other useful Z-values: - P(Z < 1.645) ≈ 0.95 (95th percentile) - P(Z < 1.96) ≈ 0.975 - P(Z < 2.576) ≈ 0.995
4. Moment Generating Function (MGF)
Standard normal: Z ~ N(0, 1)
$M_Z(t) = E[e^{tZ}] = ∫_{−∞}^{∞} e^{tz} (1/√(2π)) e^{−z²/2} dz
$
Complete the square: tz − z²/2 = −(z² − 2tz)/2 = −[(z − t)² − t²]/2 = −(z − t)²/2 + t²/2.
M_Z(t) = e^{t²/2} ∫_{−∞}^{∞} (1/√(2π)) e^{−(z − t)²/2} dz = e^{t²/2} · 1 = e^{t²/2}
General normal: X ~ N(μ, σ²), where X = μ + σZ.
$M_X(t) = E[e^{t(μ + σZ)}] = e^{μt} E[e^{(tσ)Z}] = e^{μt} M_Z(tσ) = e^{μt} e^{σ²t²/2} = exp(μt + σ²t²/2)
$
Computing moments from MGF: - M'_X(0) = E[X] = μ - M''_X(0) = E[X²] = μ² + σ² - Var(X) = E[X²] − (E[X])² = σ²
- E[(X−μ)³] = 0 (skewness = 0 — normal is symmetric)
- E[(X−μ)⁴] = 3σ⁴ (kurtosis = 3 — benchmark for "normal" tail weight)
5. Sum of Independent Normals
Theorem: If X₁ ~ N(μ₁, σ₁²) and X₂ ~ N(μ₂, σ₂²) are independent, then:
$X₁ + X₂ ~ N(μ₁ + μ₂, σ₁² + σ₂²) $
Proof via MGF: By independence, M_{X₁+X₂}(t) = M_{X₁}(t) M_{X₂}(t) = exp(μ₁t + σ₁²t²/2) · exp(μ₂t + σ₂²t²/2) = exp((μ₁+μ₂)t + (σ₁²+σ₂²)t²/2). This is the MGF of N(μ₁+μ₂, σ₁²+σ₂²). Since MGFs uniquely determine distributions, the result follows.
Corollary: For independent Xᵢ ~ N(μᵢ, σᵢ²), the sum Σ Xᵢ ~ N(Σ μᵢ, Σ σᵢ²). The sample mean of n i.i.d. N(μ, σ²) random variables is X̄ ~ N(μ, σ²/n).
Linearity: aX₁ + bX₂ ~ N(aμ₁ + bμ₂, a²σ₁² + b²σ₂²). Any linear combination of independent normals is normal.
Edge case: If X and Y are JOINTLY normal with correlation ρ, then X+Y ~ N(μ_X+μ_Y, σ_X²+σ_Y²+2ρσ_Xσ_Y). Still normal! This property is special — the sum of normals is normal even when they are dependent, provided they are jointly normal.
Key Terms
- 10 08 Normal Gaussian Distribution
- 10-09 More Continuous Distributions
- Answer: a.
- Answer: b.
- Answer: c.
- Answer: d.
- MGF of X
- Subject 10-08: Normal (Gaussian) Distribution
- Why 1/√(2π)?
- standard normal
Worked Examples
Example 1: Standard Normal Probabilities
Let Z ~ N(0, 1). Find: (a) P(Z < 1.5), (b) P(Z > −0.8), (c) P(−2 < Z < 1), (d) the value c such that P(Z > c) = 0.05.
Solution:
(a) P(Z < 1.5) = Φ(1.5) ≈ 0.9332
(b) P(Z > −0.8) = 1 − Φ(−0.8) = Φ(0.8) ≈ 0.7881 (by symmetry)
(c) P(−2 < Z < 1) = Φ(1) − Φ(−2) = Φ(1) − (1−Φ(2)) = 0.8413 − (1−0.9772) = 0.8413 − 0.0228 = 0.8185
(d) P(Z > c) = 0.05 → Φ(c) = 0.95 → c ≈ 1.645 (the 95th percentile)
Example 2: General Normal Application
IQ scores are N(100, 15²). What proportion of the population has: (a) IQ > 130? (b) IQ between 85 and 115? (c) What IQ is the 98th percentile?
Solution:
(a) Z = (130−100)/15 = 2. P(Z > 2) = 1 − Φ(2) ≈ 1 − 0.9772 = 0.0228. About 2.28%.
(b) Z₁ = (85−100)/15 = −1, Z₂ = (115−100)/15 = 1. P(−1 < Z < 1) ≈ 0.6827. About 68.3%.
(c) Φ(z) = 0.98 → z ≈ 2.054. IQ = 100 + 2.054(15) ≈ 130.8.
Example 3: Sum of Independent Normals
Weights of apples from orchard A: N(150g, 20²). From orchard B: N(180g, 25²). If you take one apple from each (independent), what is P(total > 350g)?
Solution:
Total T = X_A + X_B ~ N(150+180, 20²+25²) = N(330, 1025). σ_T = √1025 ≈ 32.02.
Z = (350 − 330)/32.02 ≈ 0.625. P(T > 350) = 1 − Φ(0.625) ≈ 1 − 0.734 = 0.266. About 26.6%.
Quiz
Q1: The standard normal PDF φ(z) includes the factor 1/√(2π). This factor ensures:
A) The function is differentiable B) The total area under the curve equals 1 C) The peak height equals 1 D) The function is symmetric about zero
Correct: B)
- If you chose B: Correct! The normalizing constant 1/√(2π) ensures ∫ φ(z) dz = 1. This is verified by squaring the integral and switching to polar coordinates.
- If you chose A: Differentiability comes from the exponential form, not the constant.
- If you chose C: The peak height φ(0) = 1/√(2π) ≈ 0.399, not 1.
- If you chose D: Symmetry comes from the z² term in the exponent (even function), not the constant.
Q2: If X ~ N(μ, σ²), the Z-score Z = (X − μ)/σ follows:
A) N(0, 1) B) N(μ, σ²) C) N(0, σ²) D) N(μ, 1)
Correct: A)
- If you chose A: Correct! Standardization transforms any normal to standard normal: E[Z] = (μ−μ)/σ = 0, Var(Z) = Var(X)/σ² = σ²/σ² = 1.
- If you chose B: This would be the original distribution — the Z-score changes it.
- If you chose C: The mean becomes 0 but the variance should also become 1.
- If you chose D: The variance becomes 1 but the mean should also become 0.
Q3: The 68-95-99.7 rule for N(μ, σ²) states that approximately 95% of observations fall within:
A) μ ± σ B) μ ± 2σ C) μ ± 3σ D) μ ± 1.96σ
Correct: B)
- If you chose B: Correct! Approximately 95% of the probability mass lies within 2 standard deviations of the mean. More precisely: P(|Z| < 1.96) = 0.95.
- If you chose A: μ ± σ contains approximately 68% of observations.
- If you chose C: μ ± 3σ contains approximately 99.7% of observations.
- If you chose D: This is the exact 95% interval, not the approximate rule of thumb.
Q4: If X₁ ~ N(μ₁, σ₁²) and X₂ ~ N(μ₂, σ₂²) are independent, then X₁ + X₂ ~:
A) N(μ₁ + μ₂, σ₁² + σ₂²) B) N(μ₁μ₂, σ₁²σ₂²) C) N(μ₁ + μ₂, σ₁σ₂) D) Approximately normal by CLT
Correct: A)
- If you chose A: Correct! Independent normals add: means add, variances add. This is an exact result, not just asymptotic.
- If you chose B: Means and variances don't multiply — that's not how normal sums work.
- If you chose C: Variances add, not standard deviations. Var(X₁+X₂) = σ₁² + σ₂².
- If you chose D: This is an exact result for normals (CLT gives approximate normality for sums of non-normal RVs).
Q5: By symmetry of the standard normal, Φ(−z) equals:
A) Φ(z) B) 1 − Φ(z) C) −Φ(z) D) 2Φ(z) − 1
Correct: B)
- If you chose B: Correct! Since φ(z) is symmetric about 0, P(Z ≤ −z) = P(Z ≥ z) = 1 − P(Z ≤ z) = 1 − Φ(z).
- If you chose A: This would mean the distribution is symmetric about 0 with equal probability on both sides, which is false.
- If you chose C: CDF values are always between 0 and 1; this could be negative.
- If you chose D: This is P(|Z| < z) = Φ(z) − Φ(−z) = 2Φ(z) − 1.
Q6: The MGF of X ~ N(μ, σ²) is M(t) = exp(μt + σ²t²/2). M'(0) equals:
A) μ B) σ² C) μ + σ² D) 0
Correct: A)
- If you chose A: Correct! The first derivative of the MGF at 0 gives E[X] = μ. M'(t) = (μ + σ²t)exp(μt + σ²t²/2), so M'(0) = μ.
- If you chose B: M''(0) − (M'(0))² = (μ² + σ²) − μ² = σ² gives the variance, not the mean.
- If you chose C: This doesn't correspond to a standard moment.
- If you chose D: M(0) = 1 (always true for MGFs), but M'(0) = μ.
Practice Problems
-
If Z ~ N(0, 1), find P(Z < −1.96), P(|Z| > 2.33), and the interquartile range (IQR: distance between 25th and 75th percentiles).
-
Heights are N(170cm, 10²). Find the proportion between 165cm and 180cm. Find the height that only 5% exceed.
-
Derive E[Z²] for Z ~ N(0,1) using the MGF: verify M''_Z(0) = 1.
-
If X₁ ~ N(10, 4) and X₂ ~ N(15, 9) are independent, find the distribution of Y = 2X₁ − X₂ + 5.
-
The MGF of X is M(t) = exp(3t + 8t²). Find E[X] and Var(X).
-
Show that the mode of N(μ, σ²) is at x = μ by finding where f'(x) = 0.
-
If 10 independent measurements Xᵢ ~ N(μ, 4), what is P(|X̄ − μ| < 0.5)?
Answers
1. P(Z < −1.96) = 0.025. P(|Z| > 2.33) = 2(1−Φ(2.33)) ≈ 2(0.0099) = 0.0198. IQR: Φ(z₀.₂₅) = 0.25 → z = −0.674; Φ(z₀.₇₅) = 0.75 → z = 0.674. IQR = 0.674 − (−0.674) = 1.348 in Z-units. 2. Z₁ = (165−170)/10 = −0.5, Z₂ = (180−170)/10 = 1.0. P = Φ(1)−Φ(−0.5) = 0.8413−0.3085 = 0.5328. 95th percentile: z=1.645, height = 170+1.645·10 = 186.45 cm. 3. M_Z(t) = e^{t²/2}. M'_Z(t) = t e^{t²/2}, so M'_Z(0) = 0. M''_Z(t) = e^{t²/2} + t²e^{t²/2}, so M''_Z(0) = 1. E[Z²] = 1, Var(Z) = 1−0 = 1. ✓ 4. Y = 2X₁ − X₂ + 5. E[Y] = 2(10) − 15 + 5 = 10. Var(Y) = 4(4) + 1(9) = 25. Y ~ N(10, 25). 5. M(t) = exp(3t + 8t²). Comparing to exp(μt + σ²t²/2): μ = 3, σ²/2 = 8 → σ² = 16. So E[X] = 3, Var(X) = 16, X ~ N(3, 16). 6. f(x) = (1/√(2πσ²)) exp(−(x−μ)²/(2σ²)). f'(x) = f(x) · [−(x−μ)/σ²] = 0 only at x = μ. Second derivative confirms it's a maximum. 7. X̄ ~ N(μ, 4/10) = N(μ, 0.4). σ_X̄ = √0.4 ≈ 0.6325. P(|X̄−μ| < 0.5) = P(|Z| < 0.5/0.6325) = P(|Z| < 0.79) = 2Φ(0.79)−1 ≈ 2(0.7852)−1 = 0.5704.Summary
- The standard normal N(0, 1) has PDF φ(z) = (1/√(2π)) e^{−z²/2}; its CDF Φ(z) has no closed form but is universally tabulated
- Any normal X ~ N(μ, σ²) standardizes to Z = (X−μ)/σ ~ N(0, 1); Z-scores measure distance from the mean in standard deviation units
- The 68-95-99.7 rule provides quick approximations: ~68% within 1σ, ~95% within 2σ, ~99.7% within 3σ of μ
- MGF is M(t) = exp(μt + σ²t²/2); it shows skewness = 0 and excess kurtosis = 0 for the normal
- Sums of independent normals are normal (closed under convolution): Σ Xᵢ ~ N(Σμᵢ, Σσᵢ²) — this makes the normal uniquely convenient for inference
Pitfalls
- Forgetting to standardize before using the standard normal table. If X ~ N(100, 15²) and you want P(X > 115), you MUST compute Z = (115−100)/15 = 1 first, then look up Φ(1). Looking up 115 directly in the standard normal table gives nonsense.
- Confusing variance σ² with standard deviation σ. The normal is parameterized by variance: X ~ N(μ, σ²). So N(0, 4) has standard deviation 2, not 4. When summing independent normals, VARIANCES add: N(0,4) + N(0,9) = N(0,13), NOT N(0, 2+3=5). The standard deviations do NOT add.
- Treating the 68-95-99.7 rule as exact. These are approximations. The exact values are: 68.27% within 1σ, 95.45% within 2σ, 99.73% within 3σ. For precise work, especially tail probabilities, use Φ(z) values, not the empirical rule.
- Assuming the sum of normals is always normal. If X and Y are each marginally normal but NOT jointly normal, X + Y may NOT be normal. Example: let X ~ N(0,1) and Y = X if |X| < c, Y = −X otherwise — both are marginally N(0,1) but their sum is a mixture, not normal. Independence or joint normality is required.
- Misapplying Φ(−z) = 1 − Φ(z) for general normals. The symmetry formula Φ(−z) = 1 − Φ(z) applies to the STANDARD normal Z ~ N(0, 1). It does NOT hold for a general normal X ~ N(μ, σ²) unless μ = 0. For general normals, use standardization first.
Quiz
-
If X ~ N(100, 15²), the Z-score for X = 130 is: a) 1.0 b) 1.5 c) 2.0 d) 3.0 Answer: c. Z = (130−100)/15 = 30/15 = 2.0.
-
For Z ~ N(0, 1), P(−1 < Z < 1) is approximately: a) 50% b) 68% c) 95% d) 99.7% Answer: b. The 68-95-99.7 rule: ~68% within 1σ.
-
The MGF of a normal distribution uniquely identifies it because: a) All MGFs are unique b) MGFs uniquely determine distributions when they exist in a neighborhood of 0 c) Normals have the simplest MGF d) The normal MGF is always positive Answer: b. MGFs uniquely characterize distributions in general when they converge near 0.
-
If X₁ ~ N(2, 1) and X₂ ~ N(3, 4) are independent, Var(X₁ + X₂) = ? a) 3 b) 5 c) √5 d) 7 Answer: b. Var(X₁+X₂) = 1 + 4 = 5.
-
The normal PDF is symmetric about: a) σ b) μ c) σ² d) The y-axis Answer: b. f(μ+x) = f(μ−x) — symmetry about the mean μ.
-
Φ(−z) equals: a) −Φ(z) b) 1/Φ(z) c) Φ(z) d) 1 − Φ(z) Answer: d. By symmetry of the standard normal: Φ(−z) = 1 − Φ(z).
-
The sample mean of n i.i.d. N(μ, σ²) observations has variance: a) σ² b) σ²/n c) σ/√n d) nσ² Answer: b. X̄ ~ N(μ, σ²/n). The variance shrinks as sample size increases.
-
The sum of two independent normal RVs is: a) Always normal b) Normal only if they have equal variances c) Not necessarily normal d) Approximately normal by CLT Answer: a. The normal is closed under convolution — the sum of independent normals is exactly normal with summed parameters.
Next Steps
Continue to 10-09 More Continuous Distributions to learn about the Gamma, chi-squared, Student's t, Beta, and Cauchy distributions.