📐 Concept diagram

Phase 11: Probability Theory II

Subject 11-01: Expectation for Continuous Random Variables

Prerequisites: 10-07 (Continuous Random Variables), 10-06 (Expectation of Discrete RVs), basic calculus

Learning Objectives

Compute expected values for continuous random variables using LOTUS (Law of the Unconscious Statistician)
Define and derive moment generating functions (MGFs) for continuous distributions
Define characteristic functions and explain their advantages over MGFs
Compute moments (mean, variance, skewness, kurtosis) from MGFs and characteristic functions
Apply Jensen's inequality to relate E[g(X)] and g(E[X])

Core Content

1. LOTUS for Continuous Random Variables

For a continuous random variable X with PDF f_X(x), the Law of the Unconscious Statistician states:

$E[g(X)] = ∫_{−∞}^{∞} g(x) f_X(x) dx
$

You do NOT need to find the distribution of Y = g(X) — just integrate g(x) weighted by the PDF of X.

⚠️ CRITICAL: E[g(X)] ≠ g(E[X]) in general. For convex functions g, Jensen's inequality gives E[g(X)] ≥ g(E[X]).

Example (variance via LOTUS): Var(X) = E[(X − μ)²] = ∫ (x − μ)² f_X(x) dx. No need to find the distribution of (X − μ)².

Edge case: The integral must converge absolutely for E[g(X)] to exist.

2. Moment Generating Functions (MGF)

The moment generating function of X is:

$M_X(t) = E[e^{tX}] = ∫_{−∞}^{∞} e^{tx} f_X(x) dx
$

defined for all t where the integral converges (the MGF may not exist for all t, or may not exist at all — e.g., the Cauchy distribution has no MGF).

Key property — generating moments: The k-th moment is obtained by differentiating M_X(t) k times and evaluating at t = 0:

$E[Xᵏ] = M_X^{(k)}(0)
$

Proof sketch: Expand e^{tX} = 1 + tX + t²X²/2! + t³X³/3! + ... and take expectation term by term:

$M_X(t) = E[1 + tX + t²X²/2! + ...] = 1 + t E[X] + t²E[X²]/2! + ...
$

Differentiating k times and setting t = 0 isolates the k-th moment.

Common MGFs (continuous):

Distribution	MGF	Domain
Uniform(0, 1)	(eᵗ − 1)/t	all t
Exponential(λ)	λ/(λ − t)	t < λ
Gamma(α, β)	(1 − t/β)^{−α}	t < β
Normal(μ, σ²)	exp(μt + σ²t²/2)	all t
Chi-squared(k)	(1 − 2t)^{−k/2}	t < 1/2

Why MGFs matter: 1. Uniqueness: If M_X(t) = M_Y(t) in a neighborhood of 0, then X and Y have the same distribution. 2. Sums of independent RVs: M_{X+Y}(t) = M_X(t) · M_Y(t) when X, Y are independent. 3. Proving limit theorems: Convergence of MGFs implies convergence in distribution (Lévy's continuity theorem).

Pitfall: The MGF may not exist for any t ≠ 0. Example: the log-normal distribution has no MGF because E[e^{tX}] diverges for any t > 0.

3. Characteristic Functions

The characteristic function of X is:

$φ_X(t) = E[e^{itX}] = ∫_{−∞}^{∞} e^{itx} f_X(x) dx
$

where i = √(−1). This is essentially the Fourier transform of the PDF.

Key advantages over MGFs: 1. Always exists: |e^{itx}| = 1, so |φ_X(t)| ≤ 1 for all t. The characteristic function exists for EVERY distribution, including the Cauchy. 2. Inversion formula: The PDF can be recovered from φ_X(t) via inverse Fourier transform: $f_X(x) = (1/(2π)) ∫_{−∞}^{∞} e^{−itx} φ_X(t) dt$ 3. Uniqueness: φ_X uniquely determines the distribution (stronger than MGF since it always exists).

Moment generation from characteristic functions:

$E[Xᵏ] = i^{−k} φ_X^{(k)}(0)
$

provided the k-th moment exists.

Example — Standard Cauchy: PDF: f(x) = 1/(π(1+x²)). Characteristic function: φ(t) = e^{−|t|}. Notice φ is NOT differentiable at t = 0, reflecting that E[X] does not exist. The characteristic function cleanly encodes moment existence!

Common characteristic functions:

Distribution	φ(t)
Normal(0, 1)	e^{−t²/2}
Exponential(λ)	λ/(λ − it)
Cauchy(0, 1)	e^{−
Uniform(−a, a)	sin(at)/(at)

4. Moments: Mean, Variance, Skewness, Kurtosis

For a continuous RV X with PDF f:

Raw moments: μ'_k = E[Xᵏ] = ∫ xᵏ f(x) dx

Central moments: μ_k = E[(X − μ)ᵏ] where μ = E[X].

Key standardized moments:

Mean: μ = E[X] — first raw moment
Variance: σ² = E[(X − μ)²] = μ₂ — second central moment
Skewness: γ₁ = E[(X − μ)³]/σ³ = μ₃/σ³ — measures asymmetry
γ₁ > 0: right-skewed (long right tail — exponential)
γ₁ = 0: symmetric (normal)
γ₁ < 0: left-skewed
Excess Kurtosis: γ₂ = E[(X − μ)⁴]/σ⁴ − 3 — measures tail weight relative to normal
γ₂ > 0: heavier tails than normal (leptokurtic — t-distribution)
γ₂ = 0: normal tail weight (mesokurtic — normal)
γ₂ < 0: lighter tails than normal (platykurtic — uniform)

Common Pitfall: Kurtosis measures tail weight, NOT "peakedness." A distribution can have higher kurtosis than normal while being flatter at the center — the t-distribution with low df is an example.

Computing skewness/kurtosis from MGF: Use cumulant generating function K(t) = ln M_X(t). The cumulants κᵣ relate to moments. For the normal distribution, κ₁ = μ, κ₂ = σ², κ₃ = κ₄ = ... = 0 — all cumulants beyond order 2 are zero, which characterizes the normal distribution.

5. Jensen's Inequality

Theorem: If g is a convex function, then:

$E[g(X)] ≥ g(E[X])
$

If g is strictly convex, equality holds iff X is constant (almost surely).

Convexity check: g''(x) ≥ 0 for all x ⇒ g is convex. Examples: g(x) = x², x⁴, eˣ, −ln(x), 1/x (for x > 0).

Examples: - E[X²] ≥ (E[X])² (since g(x)=x² is convex) — this is equivalent to Var(X) ≥ 0 - E[e^X] ≥ e^{E[X]} - E[1/X] ≥ 1/E[X] for X > 0 (since g(x)=1/x is convex for x > 0)

Concave functions (reverse inequality): g''(x) ≤ 0 ⇒ E[g(X)] ≤ g(E[X]). Examples: g(x) = ln(x), √x.

Application — Information theory: By Jensen, E[−ln(f(X))] ≥ −ln(E[f(X)]) — the foundation for entropy bounds.

Key Terms

11 01 Expectation Continuous Rv
11-02 Covariance and Correlation
Answer: b.
Answer: c.
Distribution
MGF of a sum of independent random variables
Subject 11-01: Expectation for Continuous Random Variables
characteristic function
moment generating function

Worked Examples

Example 1: Computing MGF and Moments

Let X ~ Exponential(λ). (a) Derive the MGF. (b) Use it to compute E[X], E[X²], and Var(X).

Solution:

(a) M_X(t) = E[e^{tX}] = ∫₀^{∞} e^{tx} λ e^{−λx} dx = λ ∫₀^{∞} e^{−(λ−t)x} dx

For t < λ: = λ [−e^{−(λ−t)x}/(λ−t)]₀^{∞} = λ · (1/(λ−t)) = λ/(λ−t).

(b) M'_X(t) = λ/(λ−t)². M'_X(0) = λ/λ² = 1/λ. So E[X] = 1/λ. M''_X(t) = 2λ/(λ−t)³. M''_X(0) = 2λ/λ³ = 2/λ². Var(X) = E[X²] − (E[X])² = 2/λ² − 1/λ² = 1/λ². ✓

Example 2: Characteristic Function of Normal

Find the characteristic function of Z ~ N(0, 1).

Solution:

φ_Z(t) = E[e^{itZ}] = ∫_{−∞}^{∞} e^{itz} · (1/√(2π)) e^{−z²/2} dz

Complete the square in the exponent: itz − z²/2 = −(z² − 2itz)/2 = −((z − it)² + t²)/2 = −(z − it)²/2 − t²/2.

φ_Z(t) = e^{−t²/2} ∫_{−∞}^{∞} (1/√(2π)) e^{−(z−it)²/2} dz

The integral equals 1 (it's a shifted normal kernel, and by contour integration or recognizing the moment generating function pattern, the integral of a complex-shifted Gaussian is 1). Thus φ_Z(t) = e^{−t²/2}.

For X ~ N(μ, σ²): X = μ + σZ, so φ_X(t) = e^{iμt} φ_Z(σt) = exp(iμt − σ²t²/2).

Example 3: Jensen's Inequality in Action

Let X ~ Uniform(0, 2). Compute E[X²] and (E[X])², and verify E[X²] ≥ (E[X])².

Solution:

E[X] = (0+2)/2 = 1. So (E[X])² = 1.

E[X²] = ∫₀² x² · (1/2) dx = (1/2)[x³/3]₀² = (1/2)(8/3) = 4/3 ≈ 1.333.

Clearly 4/3 > 1, so E[X²] > (E[X])², consistent with Jensen (x² is strictly convex). The gap E[X²] − (E[X])² = 4/3 − 1 = 1/3 is Var(X). ✓

Quiz

Q1: LOTUS for a continuous random variable states that E[g(X)] equals:

A) g(E[X]) B) ∫ g(x) f_X(x) dx C) Σ g(x) p_X(x) D) g(∫ x f_X(x) dx)

Correct: B)

If you chose B: Correct! LOTUS lets you compute E[g(X)] by integrating g(x) times the PDF, without finding the distribution of g(X) first.
If you chose A: This only holds when g is linear. In general, E[g(X)] ≠ g(E[X]) — Jensen's inequality.
If you chose C: This is the discrete LOTUS. Continuous uses integration.
If you chose D: This equals g(E[X]), which is generally incorrect for nonlinear g.

Q2: The moment generating function M_X(t) = E[e^{tX}] has the property that:

A) M_X(1) = E[X] B) M'_X(0) = E[X] C) M_X(0) = E[X] D) M''_X(0) = E[X]

Correct: B)

If you chose B: Correct! The n-th derivative at 0 gives the n-th moment: M^(n)_X(0) = E[X^n]. So M'_X(0) = E[X] and M''_X(0) = E[X²].
If you chose A: M_X(1) = E[e^X], not E[X].
If you chose C: M_X(0) = E[1] = 1 for any random variable.
If you chose D: M''_X(0) = E[X²] (the second moment), not E[X].

Q3: If the MGF of X exists in a neighborhood of 0, then:

A) X must be normally distributed B) The MGF uniquely determines the distribution of X C) All moments of X are zero D) X has finite support

Correct: B)

If you chose B: Correct! When the MGF exists, it uniquely characterizes the distribution. Two RVs with the same MGF have the same distribution.
If you chose A: Many distributions have MGFs (normal, gamma, Poisson, etc.), not just the normal.
If you chose C: If all moments were zero, the MGF would be identically 1.
If you chose D: The normal distribution has an MGF and infinite support.

Q5: Characteristic functions differ from MGFs in that:

A) They always exist for any random variable B) They only work for discrete distributions C) They don't generate moments D) They are always real-valued

Correct: A)

If you chose A: Correct! The characteristic function φ(t) = E[e^{itX}] always exists because |e^{itX}| = 1. The MGF E[e^{tX}] may not exist for some distributions (e.g., Cauchy).
If you chose B: Characteristic functions work for all distributions.
If you chose C: Characteristic functions DO generate moments via derivatives at 0.
If you chose D: Characteristic functions are generally complex-valued.

Practice Problems

Let X ~ Uniform(0, θ). Derive the MGF and use it to compute E[X] and Var(X).
Compute the characteristic function of X ~ Exponential(λ). Use it to verify E[X] = 1/λ.
For X ~ Gamma(α, β), the MGF is M(t) = (1 − t/β)^{−α}. Find E[X] and Var(X) by differentiating the MGF.
Let X have PDF f(x) = (1/2)e^{−|x|} for −∞ < x < ∞ (Laplace distribution). Find E[X], E[|X|], and Var(X).
Prove that for any random variable X with finite variance, E[X²] ≥ (E[X])². When does equality hold?
Show that the characteristic function of the Cauchy(0, 1) distribution is e^{−|t|}. Explain why this implies no finite moments.
If X and Y are independent, prove that φ_{X+Y}(t) = φ_X(t) φ_Y(t).

Answers

1. M_X(t) = (e^{θt} − 1)/(θt). Using series expansion: M(t) = 1 + (θt)/2 + (θt)²/6 + ... so E[X] = M'(0) = θ/2, E[X²] = M''(0) = θ²/3, Var(X) = θ²/3 − (θ/2)² = θ²/12. 2. φ_X(t) = ∫₀^{∞} e^{itx} λ e^{−λx} dx = λ ∫₀^{∞} e^{−(λ−it)x} dx = λ/(λ−it). φ'_X(t) = iλ/(λ−it)². E[X] = i^{−1} φ'_X(0) = −i · iλ/λ² = 1/λ. ✓ 3. M'(t) = (α/β)(1 − t/β)^{−α−1}, M'(0) = α/β. M''(t) = (α(α+1)/β²)(1 − t/β)^{−α−2}, M''(0) = α(α+1)/β². Var(X) = α(α+1)/β² − (α/β)² = α/β². 4. By symmetry, E[X] = 0. E[|X|] = 2∫₀^{∞} x·(1/2)e^{−x} dx = ∫₀^{∞} x e^{−x} dx = 1. E[X²] = 2∫₀^{∞} x²·(1/2)e^{−x} dx = ∫₀^{∞} x² e^{−x} dx = Γ(3) = 2. Var(X) = 2 − 0² = 2. 5. Var(X) = E[(X−μ)²] = E[X²] − 2μE[X] + μ² = E[X²] − μ² ≥ 0, so E[X²] ≥ μ² = (E[X])². Equality holds iff Var(X) = 0, i.e., P(X = c) = 1 for some constant c. 6. φ(t) = ∫ e^{itx}/(π(1+x²)) dx = e^{−|t|} (requires contour integration or recognizing it as the Fourier transform of the Cauchy). φ is not differentiable at t = 0, so no moments exist — the k-th derivative at 0 doesn't exist for any k ≥ 1. 7. φ_{X+Y}(t) = E[e^{it(X+Y)}] = E[e^{itX} e^{itY}]. By independence, E[e^{itX} e^{itY}] = E[e^{itX}] E[e^{itY}] = φ_X(t) φ_Y(t).

Summary

LOTUS: E[g(X)] = ∫ g(x) f_X(x) dx — compute expectations of functions without finding the distribution of g(X); Jensen's inequality gives direction: E[g(X)] ≥ g(E[X]) for convex g
The MGF M_X(t) = E[e^{tX}] generates moments via derivatives at 0, uniquely identifies distributions, and factorizes for independent sums — but it may not exist (e.g., Cauchy)
The characteristic function φ_X(t) = E[e^{itX}] ALWAYS exists, uniquely identifies the distribution, and its differentiability at 0 indicates which moments exist
Skewness (γ₁) measures asymmetry; excess kurtosis (γ₂) measures tail weight relative to normal — NOT peakedness
Moments can be extracted from MGFs (M^{(k)}(0)) or characteristic functions (i^{−k} φ^{(k)}(0))

Pitfalls

Assuming E[g(X)] = g(E[X]). LOTUS says E[g(X)] = ∫ g(x) f_X(x) dx — you integrate g(x) against the original PDF. Plugging E[X] into g is correct ONLY when g is linear. For g(x) = x², E[X²] ≠ (E[X])²; for g(x) = 1/x, E[1/X] ≠ 1/E[X]. Jensen's inequality gives the direction for convex/concave functions.
Assuming the MGF always exists. The MGF M_X(t) = E[e^{tX}] requires the integral to converge. For the Cauchy distribution, E[e^{tX}] diverges for all t ≠ 0. Even for distributions where E[X] exists, the MGF may not (e.g., log-normal). When in doubt, use the characteristic function, which ALWAYS exists.
Confusing the characteristic function with the MGF. φ_X(t) = E[e^{itX}] (contains i = √(−1)); M_X(t) = E[e^{tX}] (no i). The characteristic function is always bounded (|φ_X(t)| ≤ 1) and always exists; the MGF is unbounded and may not exist. Their derivatives give moments differently: E[Xᵏ] = M^{(k)}(0) = i^{−k} φ^{(k)}(0).
Thinking kurtosis measures "peakedness." Excess kurtosis γ₂ measures TAIL WEIGHT relative to the normal distribution. A t-distribution with low df has high kurtosis (heavy tails) but is actually flatter at the center than the normal. The interpretation as "peakedness" is a persistent misconception.
Forgetting the direction of Jensen's inequality. For convex g: E[g(X)] ≥ g(E[X]). For concave g: E[g(X)] ≤ g(E[X]). Check g''(x) to verify: g''(x) ≥ 0 ⇒ convex; g''(x) ≤ 0 ⇒ concave. Common examples: x², eˣ, 1/x (for x>0) are convex; ln(x), √x are concave.

Quiz

The Law of the Unconscious Statistician (LOTUS) for continuous RVs says: a) E[g(X)] = g(E[X]) b) E[g(X)] = ∫ g(x) f_X(x) dx c) E[g(X)] = ∫ x f_{g(X)}(x) dx d) E[g(X)] = g(∫ x f_X(x) dx) Answer: b. LOTUS lets you use the original PDF of X, not the distribution of g(X).
Which distribution has NO moment generating function (for any t ≠ 0)? a) Normal b) Exponential c) Cauchy d) Uniform Answer: c. The Cauchy has no finite mean, and its MGF does not exist for any t ≠ 0. Its characteristic function does exist: φ(t) = e^{−|t|}.
The MGF of a sum of independent random variables is: a) The sum of individual MGFs b) The product of individual MGFs c) The average of individual MGFs d) Undefined Answer: b. M_{X+Y}(t) = E[e^{t(X+Y)}] = E[e^{tX}]E[e^{tY}] = M_X(t) M_Y(t) by independence.
Jensen's inequality for a convex function g states: a) E[g(X)] ≤ g(E[X]) b) E[g(X)] = g(E[X]) c) E[g(X)] ≥ g(E[X]) d) E[g(X)] = E[X] · g(1) Answer: c. For convex g, the function of the expectation is ≤ the expectation of the function.
Skewness measures: a) The spread of the distribution b) The asymmetry of the distribution c) The peakedness of the distribution d) The range of the distribution Answer: b. Skewness γ₁ = E[(X−μ)³]/σ³. Positive = right-skewed, negative = left-skewed.
The characteristic function φ_X(t) is defined as: a) E[e^{tX}] b) E[e^{itX}] c) E[e^{−tX}] d) E[cos(tX)] Answer: b. φ_X(t) = E[e^{itX}] where i = √(−1). This always exists because |e^{itX}| = 1.
If M_X(t) = exp(2t + 8t²), then Var(X) = ? a) 2 b) 8 c) 16 d) 4 Answer: c. This is the MGF of N(μ, σ²) with μ = 2 and σ²/2 = 8 → σ² = 16.
The excess kurtosis of the normal distribution is: a) 3 b) 0 c) −3 d) 1 Answer: b. Excess kurtosis = (μ₄/σ⁴) − 3. For normal, μ₄/σ⁴ = 3, so excess = 0 by definition.

Next Steps

Continue to 11-02 Covariance and Correlation for a deeper treatment of multivariate relationships, the covariance matrix, and the geometric interpretation of correlation.

Progress

Phases

Phase 11: Probability Theory II

Subject 11-01: Expectation for Continuous Random Variables

Learning Objectives

Core Content

1. LOTUS for Continuous Random Variables

2. Moment Generating Functions (MGF)

3. Characteristic Functions

4. Moments: Mean, Variance, Skewness, Kurtosis

5. Jensen's Inequality

Key Terms

Worked Examples

Quiz

Practice Problems

Summary

Pitfalls

Quiz

Next Steps