Math graphic
📐 Concept diagram

Phase 11: Probability Theory II

Subject 11-01: Expectation for Continuous Random Variables

Prerequisites: 10-07 (Continuous Random Variables), 10-06 (Expectation of Discrete RVs), basic calculus


Learning Objectives

  1. Compute expected values for continuous random variables using LOTUS (Law of the Unconscious Statistician)
  2. Define and derive moment generating functions (MGFs) for continuous distributions
  3. Define characteristic functions and explain their advantages over MGFs
  4. Compute moments (mean, variance, skewness, kurtosis) from MGFs and characteristic functions
  5. Apply Jensen's inequality to relate E[g(X)] and g(E[X])

Core Content

1. LOTUS for Continuous Random Variables

For a continuous random variable X with PDF f_X(x), the Law of the Unconscious Statistician states:

$E[g(X)] = ∫_{−∞}^{∞} g(x) f_X(x) dx
$

You do NOT need to find the distribution of Y = g(X) — just integrate g(x) weighted by the PDF of X.

⚠️ CRITICAL: E[g(X)] ≠ g(E[X]) in general. For convex functions g, Jensen's inequality gives E[g(X)] ≥ g(E[X]).

Example (variance via LOTUS): Var(X) = E[(X − μ)²] = ∫ (x − μ)² f_X(x) dx. No need to find the distribution of (X − μ)².

Edge case: The integral must converge absolutely for E[g(X)] to exist.

2. Moment Generating Functions (MGF)

The moment generating function of X is:

$M_X(t) = E[e^{tX}] = ∫_{−∞}^{∞} e^{tx} f_X(x) dx
$

defined for all t where the integral converges (the MGF may not exist for all t, or may not exist at all — e.g., the Cauchy distribution has no MGF).

Key property — generating moments: The k-th moment is obtained by differentiating M_X(t) k times and evaluating at t = 0:

$E[Xᵏ] = M_X^{(k)}(0)
$

Proof sketch: Expand e^{tX} = 1 + tX + t²X²/2! + t³X³/3! + ... and take expectation term by term:

$M_X(t) = E[1 + tX + t²X²/2! + ...] = 1 + t E[X] + t²E[X²]/2! + ...
$

Differentiating k times and setting t = 0 isolates the k-th moment.

Common MGFs (continuous):

Distribution MGF Domain
Uniform(0, 1) (eᵗ − 1)/t all t
Exponential(λ) λ/(λ − t) t < λ
Gamma(α, β) (1 − t/β)^{−α} t < β
Normal(μ, σ²) exp(μt + σ²t²/2) all t
Chi-squared(k) (1 − 2t)^{−k/2} t < 1/2

Why MGFs matter: 1. Uniqueness: If M_X(t) = M_Y(t) in a neighborhood of 0, then X and Y have the same distribution. 2. Sums of independent RVs: M_{X+Y}(t) = M_X(t) · M_Y(t) when X, Y are independent. 3. Proving limit theorems: Convergence of MGFs implies convergence in distribution (Lévy's continuity theorem).

Pitfall: The MGF may not exist for any t ≠ 0. Example: the log-normal distribution has no MGF because E[e^{tX}] diverges for any t > 0.

3. Characteristic Functions

The characteristic function of X is:

$φ_X(t) = E[e^{itX}] = ∫_{−∞}^{∞} e^{itx} f_X(x) dx
$

where i = √(−1). This is essentially the Fourier transform of the PDF.

Key advantages over MGFs: 1. Always exists: |e^{itx}| = 1, so |φ_X(t)| ≤ 1 for all t. The characteristic function exists for EVERY distribution, including the Cauchy. 2. Inversion formula: The PDF can be recovered from φ_X(t) via inverse Fourier transform: $f_X(x) = (1/(2π)) ∫_{−∞}^{∞} e^{−itx} φ_X(t) dt$ 3. Uniqueness: φ_X uniquely determines the distribution (stronger than MGF since it always exists).

Moment generation from characteristic functions:

$E[Xᵏ] = i^{−k} φ_X^{(k)}(0)
$

provided the k-th moment exists.

Example — Standard Cauchy: PDF: f(x) = 1/(π(1+x²)). Characteristic function: φ(t) = e^{−|t|}. Notice φ is NOT differentiable at t = 0, reflecting that E[X] does not exist. The characteristic function cleanly encodes moment existence!

Common characteristic functions:

Distribution φ(t)
Normal(0, 1) e^{−t²/2}
Exponential(λ) λ/(λ − it)
Cauchy(0, 1) e^{−
Uniform(−a, a) sin(at)/(at)

4. Moments: Mean, Variance, Skewness, Kurtosis

For a continuous RV X with PDF f:

Raw moments: μ'_k = E[Xᵏ] = ∫ xᵏ f(x) dx

Central moments: μ_k = E[(X − μ)ᵏ] where μ = E[X].

Key standardized moments:

Common Pitfall: Kurtosis measures tail weight, NOT "peakedness." A distribution can have higher kurtosis than normal while being flatter at the center — the t-distribution with low df is an example.

Computing skewness/kurtosis from MGF: Use cumulant generating function K(t) = ln M_X(t). The cumulants κᵣ relate to moments. For the normal distribution, κ₁ = μ, κ₂ = σ², κ₃ = κ₄ = ... = 0 — all cumulants beyond order 2 are zero, which characterizes the normal distribution.

5. Jensen's Inequality

Theorem: If g is a convex function, then:

$E[g(X)] ≥ g(E[X])
$

If g is strictly convex, equality holds iff X is constant (almost surely).

Convexity check: g''(x) ≥ 0 for all x ⇒ g is convex. Examples: g(x) = x², x⁴, eˣ, −ln(x), 1/x (for x > 0).

Examples: - E[X²] ≥ (E[X])² (since g(x)=x² is convex) — this is equivalent to Var(X) ≥ 0 - E[e^X] ≥ e^{E[X]} - E[1/X] ≥ 1/E[X] for X > 0 (since g(x)=1/x is convex for x > 0)

Concave functions (reverse inequality): g''(x) ≤ 0 ⇒ E[g(X)] ≤ g(E[X]). Examples: g(x) = ln(x), √x.

Application — Information theory: By Jensen, E[−ln(f(X))] ≥ −ln(E[f(X)]) — the foundation for entropy bounds.



Key Terms

Worked Examples

Example 1: Computing MGF and Moments

Let X ~ Exponential(λ). (a) Derive the MGF. (b) Use it to compute E[X], E[X²], and Var(X).

Solution:

(a) M_X(t) = E[e^{tX}] = ∫₀^{∞} e^{tx} λ e^{−λx} dx = λ ∫₀^{∞} e^{−(λ−t)x} dx

For t < λ: = λ [−e^{−(λ−t)x}/(λ−t)]₀^{∞} = λ · (1/(λ−t)) = λ/(λ−t).

(b) M'_X(t) = λ/(λ−t)². M'_X(0) = λ/λ² = 1/λ. So E[X] = 1/λ. M''_X(t) = 2λ/(λ−t)³. M''_X(0) = 2λ/λ³ = 2/λ². Var(X) = E[X²] − (E[X])² = 2/λ² − 1/λ² = 1/λ². ✓


Example 2: Characteristic Function of Normal

Find the characteristic function of Z ~ N(0, 1).

Solution:

φ_Z(t) = E[e^{itZ}] = ∫_{−∞}^{∞} e^{itz} · (1/√(2π)) e^{−z²/2} dz

Complete the square in the exponent: itz − z²/2 = −(z² − 2itz)/2 = −((z − it)² + t²)/2 = −(z − it)²/2 − t²/2.

φ_Z(t) = e^{−t²/2} ∫_{−∞}^{∞} (1/√(2π)) e^{−(z−it)²/2} dz

The integral equals 1 (it's a shifted normal kernel, and by contour integration or recognizing the moment generating function pattern, the integral of a complex-shifted Gaussian is 1). Thus φ_Z(t) = e^{−t²/2}.

For X ~ N(μ, σ²): X = μ + σZ, so φ_X(t) = e^{iμt} φ_Z(σt) = exp(iμt − σ²t²/2).


Example 3: Jensen's Inequality in Action

Let X ~ Uniform(0, 2). Compute E[X²] and (E[X])², and verify E[X²] ≥ (E[X])².

Solution:

E[X] = (0+2)/2 = 1. So (E[X])² = 1.

E[X²] = ∫₀² x² · (1/2) dx = (1/2)[x³/3]₀² = (1/2)(8/3) = 4/3 ≈ 1.333.

Clearly 4/3 > 1, so E[X²] > (E[X])², consistent with Jensen (x² is strictly convex). The gap E[X²] − (E[X])² = 4/3 − 1 = 1/3 is Var(X). ✓

Quiz

Q1: LOTUS for a continuous random variable states that E[g(X)] equals:

A) g(E[X]) B) ∫ g(x) f_X(x) dx C) Σ g(x) p_X(x) D) g(∫ x f_X(x) dx)

Correct: B)


Q2: The moment generating function M_X(t) = E[e^{tX}] has the property that:

A) M_X(1) = E[X] B) M'_X(0) = E[X] C) M_X(0) = E[X] D) M''_X(0) = E[X]

Correct: B)


Q3: If the MGF of X exists in a neighborhood of 0, then:

A) X must be normally distributed B) The MGF uniquely determines the distribution of X C) All moments of X are zero D) X has finite support

Correct: B)


Q5: Characteristic functions differ from MGFs in that:

A) They always exist for any random variable B) They only work for discrete distributions C) They don't generate moments D) They are always real-valued

Correct: A)


Practice Problems

  1. Let X ~ Uniform(0, θ). Derive the MGF and use it to compute E[X] and Var(X).
  2. Compute the characteristic function of X ~ Exponential(λ). Use it to verify E[X] = 1/λ.
  3. For X ~ Gamma(α, β), the MGF is M(t) = (1 − t/β)^{−α}. Find E[X] and Var(X) by differentiating the MGF.
  4. Let X have PDF f(x) = (1/2)e^{−|x|} for −∞ < x < ∞ (Laplace distribution). Find E[X], E[|X|], and Var(X).
  5. Prove that for any random variable X with finite variance, E[X²] ≥ (E[X])². When does equality hold?
  6. Show that the characteristic function of the Cauchy(0, 1) distribution is e^{−|t|}. Explain why this implies no finite moments.
  7. If X and Y are independent, prove that φ_{X+Y}(t) = φ_X(t) φ_Y(t).
Answers 1. M_X(t) = (e^{θt} − 1)/(θt). Using series expansion: M(t) = 1 + (θt)/2 + (θt)²/6 + ... so E[X] = M'(0) = θ/2, E[X²] = M''(0) = θ²/3, Var(X) = θ²/3 − (θ/2)² = θ²/12. 2. φ_X(t) = ∫₀^{∞} e^{itx} λ e^{−λx} dx = λ ∫₀^{∞} e^{−(λ−it)x} dx = λ/(λ−it). φ'_X(t) = iλ/(λ−it)². E[X] = i^{−1} φ'_X(0) = −i · iλ/λ² = 1/λ. ✓ 3. M'(t) = (α/β)(1 − t/β)^{−α−1}, M'(0) = α/β. M''(t) = (α(α+1)/β²)(1 − t/β)^{−α−2}, M''(0) = α(α+1)/β². Var(X) = α(α+1)/β² − (α/β)² = α/β². 4. By symmetry, E[X] = 0. E[|X|] = 2∫₀^{∞} x·(1/2)e^{−x} dx = ∫₀^{∞} x e^{−x} dx = 1. E[X²] = 2∫₀^{∞} x²·(1/2)e^{−x} dx = ∫₀^{∞} x² e^{−x} dx = Γ(3) = 2. Var(X) = 2 − 0² = 2. 5. Var(X) = E[(X−μ)²] = E[X²] − 2μE[X] + μ² = E[X²] − μ² ≥ 0, so E[X²] ≥ μ² = (E[X])². Equality holds iff Var(X) = 0, i.e., P(X = c) = 1 for some constant c. 6. φ(t) = ∫ e^{itx}/(π(1+x²)) dx = e^{−|t|} (requires contour integration or recognizing it as the Fourier transform of the Cauchy). φ is not differentiable at t = 0, so no moments exist — the k-th derivative at 0 doesn't exist for any k ≥ 1. 7. φ_{X+Y}(t) = E[e^{it(X+Y)}] = E[e^{itX} e^{itY}]. By independence, E[e^{itX} e^{itY}] = E[e^{itX}] E[e^{itY}] = φ_X(t) φ_Y(t).

Summary


Pitfalls


Quiz

  1. The Law of the Unconscious Statistician (LOTUS) for continuous RVs says: a) E[g(X)] = g(E[X]) b) E[g(X)] = ∫ g(x) f_X(x) dx c) E[g(X)] = ∫ x f_{g(X)}(x) dx d) E[g(X)] = g(∫ x f_X(x) dx) Answer: b. LOTUS lets you use the original PDF of X, not the distribution of g(X).

  2. Which distribution has NO moment generating function (for any t ≠ 0)? a) Normal b) Exponential c) Cauchy d) Uniform Answer: c. The Cauchy has no finite mean, and its MGF does not exist for any t ≠ 0. Its characteristic function does exist: φ(t) = e^{−|t|}.

  3. The MGF of a sum of independent random variables is: a) The sum of individual MGFs b) The product of individual MGFs c) The average of individual MGFs d) Undefined Answer: b. M_{X+Y}(t) = E[e^{t(X+Y)}] = E[e^{tX}]E[e^{tY}] = M_X(t) M_Y(t) by independence.

  4. Jensen's inequality for a convex function g states: a) E[g(X)] ≤ g(E[X]) b) E[g(X)] = g(E[X]) c) E[g(X)] ≥ g(E[X]) d) E[g(X)] = E[X] · g(1) Answer: c. For convex g, the function of the expectation is ≤ the expectation of the function.

  5. Skewness measures: a) The spread of the distribution b) The asymmetry of the distribution c) The peakedness of the distribution d) The range of the distribution Answer: b. Skewness γ₁ = E[(X−μ)³]/σ³. Positive = right-skewed, negative = left-skewed.

  6. The characteristic function φ_X(t) is defined as: a) E[e^{tX}] b) E[e^{itX}] c) E[e^{−tX}] d) E[cos(tX)] Answer: b. φ_X(t) = E[e^{itX}] where i = √(−1). This always exists because |e^{itX}| = 1.

  7. If M_X(t) = exp(2t + 8t²), then Var(X) = ? a) 2 b) 8 c) 16 d) 4 Answer: c. This is the MGF of N(μ, σ²) with μ = 2 and σ²/2 = 8 → σ² = 16.

  8. The excess kurtosis of the normal distribution is: a) 3 b) 0 c) −3 d) 1 Answer: b. Excess kurtosis = (μ₄/σ⁴) − 3. For normal, μ₄/σ⁴ = 3, so excess = 0 by definition.


Next Steps

Continue to 11-02 Covariance and Correlation for a deeper treatment of multivariate relationships, the covariance matrix, and the geometric interpretation of correlation.