📐 Concept diagram

Phase 10: Probability Theory

Subject 10-08: Normal (Gaussian) Distribution

Prerequisites: 10-07 (Continuous Random Variables), basic integration, moment generating functions

Learning Objectives

State the PDF of the standard normal distribution and explain why it integrates to 1
Transform between standard normal Z ~ N(0,1) and general normal X ~ N(μ, σ²) via Z = (X − μ)/σ
Apply Z-scores and the 68-95-99.7 rule for rapid probability approximation
Derive and use the moment generating function M(t) = exp(μt + σ²t²/2) to compute moments
Prove that the sum of independent normal random variables is normal with summed means and variances

Core Content

1. The Standard Normal Distribution

The standard normal (or standard Gaussian) distribution has mean 0 and variance 1.

Notation: Z ~ N(0, 1)

PDF:

$φ(z) = (1/√(2π)) · e^{−z²/2},    −∞ < z < ∞
$

Why 1/√(2π)? This is the normalizing constant. To verify:

Let I = ∫_{−∞}^{∞} e^{−x²/2} dx. Then I² = ∫∫ e^{−(x²+y²)/2} dx dy. Switch to polar coordinates: x = r cos θ, y = r sin θ, dx dy = r dr dθ.

$I² = ∫₀^{2π} ∫₀^{∞} e^{−r²/2} r dr dθ = 2π ∫₀^{∞} e^{−r²/2} r dr
$

Let u = r²/2, du = r dr: I² = 2π ∫₀^{∞} e^{−u} du = 2π · 1 = 2π. So I = √(2π).

Therefore ∫ φ(z) dz = (1/√(2π)) · √(2π) = 1. ✓

CDF:

$Φ(z) = P(Z ≤ z) = ∫_{−∞}^{z} φ(t) dt
$

Φ(z) has no closed form — it must be evaluated numerically or looked up in tables.

Key properties of φ(z): - Symmetric about 0: φ(−z) = φ(z) - Bell-shaped, unimodal at z = 0 - Inflection points at z = ±1 - Maximum value: φ(0) = 1/√(2π) ≈ 0.3989

Key properties of Φ(z): - Φ(−z) = 1 − Φ(z) (by symmetry) - Φ(0) = 1/2 - P(a < Z < b) = Φ(b) − Φ(a) - P(|Z| < z) = 2Φ(z) − 1

2. General Normal Distribution

If Z ~ N(0, 1), then the linear transformation X = μ + σZ yields:

Notation: X ~ N(μ, σ²)

PDF:

$f(x) = (1/√(2πσ²)) · exp(−(x − μ)²/(2σ²)),    −∞ < x < ∞
$

Parameters: - μ = E[X] (location parameter — where the bell is centered) - σ² = Var(X) (scale parameter — how spread out it is) - σ = standard deviation

Standardization (Z-score): If X ~ N(μ, σ²), then:

$Z = (X − μ)/σ ~ N(0, 1)
$

This lets us convert any normal probability to a standard normal one:

$P(X ≤ x) = P((X − μ)/σ ≤ (x − μ)/σ) = Φ((x − μ)/σ)
$

Z-score interpretation: (x − μ)/σ tells you how many standard deviations x is from the mean.

3. The 68-95-99.7 Rule (Empirical Rule)

For X ~ N(μ, σ²):

P(μ − σ < X < μ + σ) ≈ 0.6827 (about 68%)
P(μ − 2σ < X < μ + 2σ) ≈ 0.9545 (about 95%)
P(μ − 3σ < X < μ + 3σ) ≈ 0.9973 (about 99.7%)

In terms of Z: - P(−1 < Z < 1) = Φ(1) − Φ(−1) = 2Φ(1) − 1 ≈ 2(0.8413) − 1 = 0.6826 - P(−2 < Z < 2) = 2Φ(2) − 1 ≈ 2(0.9772) − 1 = 0.9544 - P(−3 < Z < 3) = 2Φ(3) − 1 ≈ 2(0.9987) − 1 = 0.9974

Other useful Z-values: - P(Z < 1.645) ≈ 0.95 (95th percentile) - P(Z < 1.96) ≈ 0.975 - P(Z < 2.576) ≈ 0.995

4. Moment Generating Function (MGF)

Standard normal: Z ~ N(0, 1)

$M_Z(t) = E[e^{tZ}] = ∫_{−∞}^{∞} e^{tz} (1/√(2π)) e^{−z²/2} dz
$

Complete the square: tz − z²/2 = −(z² − 2tz)/2 = −[(z − t)² − t²]/2 = −(z − t)²/2 + t²/2.

M_Z(t) = e^{t²/2} ∫_{−∞}^{∞} (1/√(2π)) e^{−(z − t)²/2} dz = e^{t²/2} · 1 = e^{t²/2}

General normal: X ~ N(μ, σ²), where X = μ + σZ.

$M_X(t) = E[e^{t(μ + σZ)}] = e^{μt} E[e^{(tσ)Z}] = e^{μt} M_Z(tσ) = e^{μt} e^{σ²t²/2} = exp(μt + σ²t²/2)
$

Computing moments from MGF: - M'_X(0) = E[X] = μ - M''_X(0) = E[X²] = μ² + σ² - Var(X) = E[X²] − (E[X])² = σ²

E[(X−μ)³] = 0 (skewness = 0 — normal is symmetric)
E[(X−μ)⁴] = 3σ⁴ (kurtosis = 3 — benchmark for "normal" tail weight)

5. Sum of Independent Normals

Theorem: If X₁ ~ N(μ₁, σ₁²) and X₂ ~ N(μ₂, σ₂²) are independent, then:

$X₁ + X₂ ~ N(μ₁ + μ₂, σ₁² + σ₂²)
$

Proof via MGF: By independence, M_{X₁+X₂}(t) = M_{X₁}(t) M_{X₂}(t) = exp(μ₁t + σ₁²t²/2) · exp(μ₂t + σ₂²t²/2) = exp((μ₁+μ₂)t + (σ₁²+σ₂²)t²/2). This is the MGF of N(μ₁+μ₂, σ₁²+σ₂²). Since MGFs uniquely determine distributions, the result follows.

Corollary: For independent Xᵢ ~ N(μᵢ, σᵢ²), the sum Σ Xᵢ ~ N(Σ μᵢ, Σ σᵢ²). The sample mean of n i.i.d. N(μ, σ²) random variables is X̄ ~ N(μ, σ²/n).

Linearity: aX₁ + bX₂ ~ N(aμ₁ + bμ₂, a²σ₁² + b²σ₂²). Any linear combination of independent normals is normal.

Edge case: If X and Y are JOINTLY normal with correlation ρ, then X+Y ~ N(μ_X+μ_Y, σ_X²+σ_Y²+2ρσ_Xσ_Y). Still normal! This property is special — the sum of normals is normal even when they are dependent, provided they are jointly normal.

Key Terms

10 08 Normal Gaussian Distribution
10-09 More Continuous Distributions
Answer: a.
Answer: b.
Answer: c.
Answer: d.
MGF of X
Subject 10-08: Normal (Gaussian) Distribution
Why 1/√(2π)?
standard normal

Worked Examples

Example 1: Standard Normal Probabilities

Let Z ~ N(0, 1). Find: (a) P(Z < 1.5), (b) P(Z > −0.8), (c) P(−2 < Z < 1), (d) the value c such that P(Z > c) = 0.05.

Solution:

(a) P(Z < 1.5) = Φ(1.5) ≈ 0.9332

(b) P(Z > −0.8) = 1 − Φ(−0.8) = Φ(0.8) ≈ 0.7881 (by symmetry)

(d) P(Z > c) = 0.05 → Φ(c) = 0.95 → c ≈ 1.645 (the 95th percentile)

Example 2: General Normal Application

IQ scores are N(100, 15²). What proportion of the population has: (a) IQ > 130? (b) IQ between 85 and 115? (c) What IQ is the 98th percentile?

Solution:

(a) Z = (130−100)/15 = 2. P(Z > 2) = 1 − Φ(2) ≈ 1 − 0.9772 = 0.0228. About 2.28%.

(b) Z₁ = (85−100)/15 = −1, Z₂ = (115−100)/15 = 1. P(−1 < Z < 1) ≈ 0.6827. About 68.3%.

Example 3: Sum of Independent Normals

Weights of apples from orchard A: N(150g, 20²). From orchard B: N(180g, 25²). If you take one apple from each (independent), what is P(total > 350g)?

Solution:

Total T = X_A + X_B ~ N(150+180, 20²+25²) = N(330, 1025). σ_T = √1025 ≈ 32.02.

Z = (350 − 330)/32.02 ≈ 0.625. P(T > 350) = 1 − Φ(0.625) ≈ 1 − 0.734 = 0.266. About 26.6%.

Quiz

Q1: The standard normal PDF φ(z) includes the factor 1/√(2π). This factor ensures:

A) The function is differentiable B) The total area under the curve equals 1 C) The peak height equals 1 D) The function is symmetric about zero

Correct: B)

If you chose B: Correct! The normalizing constant 1/√(2π) ensures ∫ φ(z) dz = 1. This is verified by squaring the integral and switching to polar coordinates.
If you chose A: Differentiability comes from the exponential form, not the constant.
If you chose C: The peak height φ(0) = 1/√(2π) ≈ 0.399, not 1.
If you chose D: Symmetry comes from the z² term in the exponent (even function), not the constant.

Q2: If X ~ N(μ, σ²), the Z-score Z = (X − μ)/σ follows:

A) N(0, 1) B) N(μ, σ²) C) N(0, σ²) D) N(μ, 1)

Correct: A)

If you chose A: Correct! Standardization transforms any normal to standard normal: E[Z] = (μ−μ)/σ = 0, Var(Z) = Var(X)/σ² = σ²/σ² = 1.
If you chose B: This would be the original distribution — the Z-score changes it.
If you chose C: The mean becomes 0 but the variance should also become 1.
If you chose D: The variance becomes 1 but the mean should also become 0.

Q3: The 68-95-99.7 rule for N(μ, σ²) states that approximately 95% of observations fall within:

A) μ ± σ B) μ ± 2σ C) μ ± 3σ D) μ ± 1.96σ

Correct: B)

If you chose B: Correct! Approximately 95% of the probability mass lies within 2 standard deviations of the mean. More precisely: P(|Z| < 1.96) = 0.95.
If you chose A: μ ± σ contains approximately 68% of observations.
If you chose C: μ ± 3σ contains approximately 99.7% of observations.
If you chose D: This is the exact 95% interval, not the approximate rule of thumb.

Q4: If X₁ ~ N(μ₁, σ₁²) and X₂ ~ N(μ₂, σ₂²) are independent, then X₁ + X₂ ~:

A) N(μ₁ + μ₂, σ₁² + σ₂²) B) N(μ₁μ₂, σ₁²σ₂²) C) N(μ₁ + μ₂, σ₁σ₂) D) Approximately normal by CLT

Correct: A)

If you chose A: Correct! Independent normals add: means add, variances add. This is an exact result, not just asymptotic.
If you chose B: Means and variances don't multiply — that's not how normal sums work.
If you chose C: Variances add, not standard deviations. Var(X₁+X₂) = σ₁² + σ₂².
If you chose D: This is an exact result for normals (CLT gives approximate normality for sums of non-normal RVs).

Q5: By symmetry of the standard normal, Φ(−z) equals:

A) Φ(z) B) 1 − Φ(z) C) −Φ(z) D) 2Φ(z) − 1

Correct: B)

If you chose B: Correct! Since φ(z) is symmetric about 0, P(Z ≤ −z) = P(Z ≥ z) = 1 − P(Z ≤ z) = 1 − Φ(z).
If you chose A: This would mean the distribution is symmetric about 0 with equal probability on both sides, which is false.
If you chose C: CDF values are always between 0 and 1; this could be negative.
If you chose D: This is P(|Z| < z) = Φ(z) − Φ(−z) = 2Φ(z) − 1.

Q6: The MGF of X ~ N(μ, σ²) is M(t) = exp(μt + σ²t²/2). M'(0) equals:

A) μ B) σ² C) μ + σ² D) 0

Correct: A)

If you chose A: Correct! The first derivative of the MGF at 0 gives E[X] = μ. M'(t) = (μ + σ²t)exp(μt + σ²t²/2), so M'(0) = μ.
If you chose B: M''(0) − (M'(0))² = (μ² + σ²) − μ² = σ² gives the variance, not the mean.
If you chose C: This doesn't correspond to a standard moment.
If you chose D: M(0) = 1 (always true for MGFs), but M'(0) = μ.

Practice Problems

If Z ~ N(0, 1), find P(Z < −1.96), P(|Z| > 2.33), and the interquartile range (IQR: distance between 25th and 75th percentiles).
Heights are N(170cm, 10²). Find the proportion between 165cm and 180cm. Find the height that only 5% exceed.
Derive E[Z²] for Z ~ N(0,1) using the MGF: verify M''_Z(0) = 1.
If X₁ ~ N(10, 4) and X₂ ~ N(15, 9) are independent, find the distribution of Y = 2X₁ − X₂ + 5.
The MGF of X is M(t) = exp(3t + 8t²). Find E[X] and Var(X).
Show that the mode of N(μ, σ²) is at x = μ by finding where f'(x) = 0.
If 10 independent measurements Xᵢ ~ N(μ, 4), what is P(|X̄ − μ| < 0.5)?

Answers

1. P(Z < −1.96) = 0.025. P(|Z| > 2.33) = 2(1−Φ(2.33)) ≈ 2(0.0099) = 0.0198. IQR: Φ(z₀.₂₅) = 0.25 → z = −0.674; Φ(z₀.₇₅) = 0.75 → z = 0.674. IQR = 0.674 − (−0.674) = 1.348 in Z-units. 2. Z₁ = (165−170)/10 = −0.5, Z₂ = (180−170)/10 = 1.0. P = Φ(1)−Φ(−0.5) = 0.8413−0.3085 = 0.5328. 95th percentile: z=1.645, height = 170+1.645·10 = 186.45 cm. 3. M_Z(t) = e^{t²/2}. M'_Z(t) = t e^{t²/2}, so M'_Z(0) = 0. M''_Z(t) = e^{t²/2} + t²e^{t²/2}, so M''_Z(0) = 1. E[Z²] = 1, Var(Z) = 1−0 = 1. ✓ 4. Y = 2X₁ − X₂ + 5. E[Y] = 2(10) − 15 + 5 = 10. Var(Y) = 4(4) + 1(9) = 25. Y ~ N(10, 25). 5. M(t) = exp(3t + 8t²). Comparing to exp(μt + σ²t²/2): μ = 3, σ²/2 = 8 → σ² = 16. So E[X] = 3, Var(X) = 16, X ~ N(3, 16). 6. f(x) = (1/√(2πσ²)) exp(−(x−μ)²/(2σ²)). f'(x) = f(x) · [−(x−μ)/σ²] = 0 only at x = μ. Second derivative confirms it's a maximum. 7. X̄ ~ N(μ, 4/10) = N(μ, 0.4). σ_X̄ = √0.4 ≈ 0.6325. P(|X̄−μ| < 0.5) = P(|Z| < 0.5/0.6325) = P(|Z| < 0.79) = 2Φ(0.79)−1 ≈ 2(0.7852)−1 = 0.5704.

Summary

The standard normal N(0, 1) has PDF φ(z) = (1/√(2π)) e^{−z²/2}; its CDF Φ(z) has no closed form but is universally tabulated
Any normal X ~ N(μ, σ²) standardizes to Z = (X−μ)/σ ~ N(0, 1); Z-scores measure distance from the mean in standard deviation units
The 68-95-99.7 rule provides quick approximations: ~68% within 1σ, ~95% within 2σ, ~99.7% within 3σ of μ
MGF is M(t) = exp(μt + σ²t²/2); it shows skewness = 0 and excess kurtosis = 0 for the normal
Sums of independent normals are normal (closed under convolution): Σ Xᵢ ~ N(Σμᵢ, Σσᵢ²) — this makes the normal uniquely convenient for inference

Pitfalls

Forgetting to standardize before using the standard normal table. If X ~ N(100, 15²) and you want P(X > 115), you MUST compute Z = (115−100)/15 = 1 first, then look up Φ(1). Looking up 115 directly in the standard normal table gives nonsense.
Confusing variance σ² with standard deviation σ. The normal is parameterized by variance: X ~ N(μ, σ²). So N(0, 4) has standard deviation 2, not 4. When summing independent normals, VARIANCES add: N(0,4) + N(0,9) = N(0,13), NOT N(0, 2+3=5). The standard deviations do NOT add.
Treating the 68-95-99.7 rule as exact. These are approximations. The exact values are: 68.27% within 1σ, 95.45% within 2σ, 99.73% within 3σ. For precise work, especially tail probabilities, use Φ(z) values, not the empirical rule.
Assuming the sum of normals is always normal. If X and Y are each marginally normal but NOT jointly normal, X + Y may NOT be normal. Example: let X ~ N(0,1) and Y = X if |X| < c, Y = −X otherwise — both are marginally N(0,1) but their sum is a mixture, not normal. Independence or joint normality is required.
Misapplying Φ(−z) = 1 − Φ(z) for general normals. The symmetry formula Φ(−z) = 1 − Φ(z) applies to the STANDARD normal Z ~ N(0, 1). It does NOT hold for a general normal X ~ N(μ, σ²) unless μ = 0. For general normals, use standardization first.

Quiz

If X ~ N(100, 15²), the Z-score for X = 130 is: a) 1.0 b) 1.5 c) 2.0 d) 3.0 Answer: c. Z = (130−100)/15 = 30/15 = 2.0.
For Z ~ N(0, 1), P(−1 < Z < 1) is approximately: a) 50% b) 68% c) 95% d) 99.7% Answer: b. The 68-95-99.7 rule: ~68% within 1σ.
The MGF of a normal distribution uniquely identifies it because: a) All MGFs are unique b) MGFs uniquely determine distributions when they exist in a neighborhood of 0 c) Normals have the simplest MGF d) The normal MGF is always positive Answer: b. MGFs uniquely characterize distributions in general when they converge near 0.
If X₁ ~ N(2, 1) and X₂ ~ N(3, 4) are independent, Var(X₁ + X₂) = ? a) 3 b) 5 c) √5 d) 7 Answer: b. Var(X₁+X₂) = 1 + 4 = 5.
The normal PDF is symmetric about: a) σ b) μ c) σ² d) The y-axis Answer: b. f(μ+x) = f(μ−x) — symmetry about the mean μ.
Φ(−z) equals: a) −Φ(z) b) 1/Φ(z) c) Φ(z) d) 1 − Φ(z) Answer: d. By symmetry of the standard normal: Φ(−z) = 1 − Φ(z).
The sample mean of n i.i.d. N(μ, σ²) observations has variance: a) σ² b) σ²/n c) σ/√n d) nσ² Answer: b. X̄ ~ N(μ, σ²/n). The variance shrinks as sample size increases.
The sum of two independent normal RVs is: a) Always normal b) Normal only if they have equal variances c) Not necessarily normal d) Approximately normal by CLT Answer: a. The normal is closed under convolution — the sum of independent normals is exactly normal with summed parameters.

Next Steps

Continue to 10-09 More Continuous Distributions to learn about the Gamma, chi-squared, Student's t, Beta, and Cauchy distributions.

Progress

Phases

Phase 10: Probability Theory

Subject 10-08: Normal (Gaussian) Distribution

Learning Objectives

Core Content

1. The Standard Normal Distribution

2. General Normal Distribution

3. The 68-95-99.7 Rule (Empirical Rule)

4. Moment Generating Function (MGF)

5. Sum of Independent Normals

Key Terms

Worked Examples

Quiz

Practice Problems

Summary

Pitfalls

Quiz

Next Steps