Math graphic
📐 Concept diagram

Phase 10: Probability Theory

Subject 10-06: Expectation of Discrete Random Variables

Prerequisites: 10-04 (Discrete Random Variables), basic summation


Learning Objectives

  1. Define expected value E[X] for discrete random variables and compute it from the PMF
  2. Apply linearity of expectation E[aX + bY] = aE[X] + bE[Y] without independence requirements
  3. Use the Law of the Unconscious Statistician (LOTUS) to compute E[g(X)] directly from the PMF
  4. Define variance Var(X) = E[(X − μ)²], standard deviation, and their properties under linear transformations
  5. Define covariance Cov(X, Y) and correlation, and explain how they measure linear dependence

Core Content

1. Definition of Expected Value

For a discrete random variable X with PMF p_X(x), the expected value (mean) is:

$E[X] = Σ_x x · p_X(x)
$

where the sum is over all values x in the support of X. For the sum to be well-defined, we require absolute convergence Σ |x| p_X(x) < ∞.

Interpretation: E[X] is the probability-weighted average of all possible values — the "long-run average" if the experiment were repeated infinitely many times.

Expected values of common discrete distributions:

Distribution E[X]
Bernoulli(p) p
Binomial(n, p) np
Geometric(p) 1/p
NegBin(r, p) r/p
Poisson(λ) λ
Hypergeometric(N, K, n) n·(K/N)

Derivation for Binomial: Let X ~ Binomial(n, p). Using the definition:

E[X] = Σ_{k=0}^{n} k · C(n,k) pᵏ (1−p)^{n−k}

Using identity k·C(n,k) = n·C(n−1, k−1):

$E[X] = Σ_{k=1}^{n} n·C(n−1, k−1) pᵏ (1−p)^{n−k}
     = np Σ_{k=1}^{n} C(n−1, k−1) p^{k−1} (1−p)^{(n−1)−(k−1)}
     = np Σ_{j=0}^{n−1} C(n−1, j) pʲ (1−p)^{(n−1)−j} = np · 1 = np
$

Derivation for Geometric:

$E[X] = Σ_{k=1}^{∞} k (1−p)^{k−1} p = p · (1/p²) = 1/p
$

using the identity Σ_{k=1}^{∞} k q^{k−1} = 1/(1−q)² for |q| < 1 with q = 1−p.

2. Linearity of Expectation

Theorem: For any random variables X, Y and constants a, b:

$E[aX + bY] = aE[X] + bE[Y]
$

Crucially, this holds even when X and Y are DEPENDENT. This is one of the most powerful tools in probability.

Proof sketch:

$E[aX + bY] = Σ_x Σ_y (a x + b y) P(X=x, Y=y)
           = a Σ_x x Σ_y P(X=x, Y=y) + b Σ_y y Σ_x P(X=x, Y=y)
           = a Σ_x x P(X=x) + b Σ_y y P(Y=y)
           = a E[X] + b E[Y]
$

Corollary: E[Σ Xᵢ] = Σ E[Xᵢ] for any finite collection. This extends to countably infinite collections under absolute convergence.

Example — Indicator variables: Let I_A be the indicator of event A: I_A = 1 if A occurs, 0 otherwise. Then E[I_A] = 1·P(A) + 0·P(Aᶜ) = P(A). This simple fact combined with linearity is extremely powerful.

Application to Binomial mean (alternative derivation): X ~ Binomial(n, p) is the sum of n independent Bernoulli(p) random variables. By linearity: E[X] = Σ_{i=1}^{n} E[I_i] = Σ p = np. Much simpler than the direct sum!

Application to Hypergeometric mean: A sample of n items drawn without replacement from N items with K successes. Even though draws are dependent, by indicator variables and symmetry: each of the n draws has probability K/N of being a success. So E[X] = n·K/N.

3. Law of the Unconscious Statistician (LOTUS)

For any function g(·):

$E[g(X)] = Σ_x g(x) · p_X(x)
$

"Unconscious" because you don't need to find the distribution of Y = g(X) — just plug X's PMF into g.

Example: E[X²] = Σ_x x² p_X(x).

Warning: E[g(X)] ≠ g(E[X]) in general (Jensen's inequality). For convex g, E[g(X)] ≥ g(E[X]).

Edge case: If X takes infinitely many values, the sum must converge absolutely for E[g(X)] to exist.

4. Variance and Standard Deviation

Definition: Let μ = E[X]. The variance of X is:

$Var(X) = E[(X − μ)²] = Σ_x (x − μ)² p_X(x)
$

Alternative computational formula:

$Var(X) = E[X²] − (E[X])²
$

Proof: Var(X) = E[(X−μ)²] = E[X² − 2μX + μ²] = E[X²] − 2μE[X] + μ² = E[X²] − 2μ² + μ² = E[X²] − μ².

Standard deviation: σ_X = √Var(X) — same units as X.

Variances of common distributions:

Distribution Var(X)
Bernoulli(p) p(1−p)
Binomial(n, p) np(1−p)
Geometric(p) (1−p)/p²
Poisson(λ) λ
NegBin(r, p) r(1−p)/p²

Properties of variance: - Var(X) ≥ 0 (variance is non-negative) - Var(aX + b) = a² Var(X) — adding a constant doesn't change variance - Var(X) = 0 if and only if P(X = c) = 1 for some constant c (degenerate distribution) - For any constants a, b: Var(aX + bY) = a²Var(X) + b²Var(Y) + 2ab Cov(X, Y)

5. Covariance and Correlation

Covariance:

$Cov(X, Y) = E[(X − E[X])(Y − E[Y])] = E[XY] − E[X]E[Y]
$

Properties: - Cov(X, X) = Var(X) - Cov(X, Y) = Cov(Y, X) (symmetric) - Cov(aX + b, cY + d) = ac·Cov(X, Y) - Cov(X, Y + Z) = Cov(X, Y) + Cov(X, Z) (bilinear)

Independence ⇒ zero covariance (but not conversely!): If X and Y are independent, then E[XY] = E[X]E[Y], so Cov(X, Y) = 0. Zero covariance does NOT imply independence — it only means no linear relationship.

Correlation coefficient:

$ρ(X, Y) = Cov(X, Y) / (σ_X σ_Y)
$

Always satisfies −1 ≤ ρ ≤ 1 (by Cauchy-Schwarz). ρ = ±1 iff Y = aX + b almost surely (perfect linear relationship).

Variance of sum:

$Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)
$

For independent X, Y: Var(X + Y) = Var(X) + Var(Y).

General sum:

$Var(Σ Xᵢ) = Σ Var(Xᵢ) + 2 Σ_{i<j} Cov(Xᵢ, Xⱼ)
$


Key Terms

Worked Examples

Example 1: Expected Value of Geometric Distribution

Let X ~ Geometric(p). Compute E[X] directly from the definition.

Solution:

$E[X] = Σ_{k=1}^{∞} k (1−p)^{k−1} p = p Σ_{k=1}^{∞} k q^{k−1}    where q = 1−p
$

Recall: Σ_{k=1}^{∞} k q^{k−1} = d/dq (Σ_{k=0}^{∞} qᵏ) = d/dq (1/(1−q)) = 1/(1−q)² = 1/p².

Therefore E[X] = p · (1/p²) = 1/p.

For p=0.5 (fair coin, waiting for first heads), E[X] = 2 flips on average. ✓


Example 2: Variance via LOTUS

Let X be the outcome of a fair die roll. Find Var(X).

Solution:

PMF: p(k) = 1/6 for k = 1, ..., 6.

E[X] = (1+2+3+4+5+6)/6 = 21/6 = 3.5.

E[X²] = (1+4+9+16+25+36)/6 = 91/6 ≈ 15.1667.

Var(X) = E[X²] − (E[X])² = 91/6 − (7/2)² = 91/6 − 49/4 = (182 − 147)/12 = 35/12 ≈ 2.917.

σ = √(35/12) ≈ 1.708.


Example 3: Covariance and Correlation

Joint PMF of (X, Y):

X\Y 0 1
0 0.2 0.1
1 0.3 0.4

Find Cov(X, Y) and ρ(X, Y).

Solution:

Marginals: P(X=0) = 0.3, P(X=1) = 0.7; P(Y=0) = 0.5, P(Y=1) = 0.5.

E[X] = 0·0.3 + 1·0.7 = 0.7 E[Y] = 0·0.5 + 1·0.5 = 0.5 E[XY] = 0·0·0.2 + 0·1·0.1 + 1·0·0.3 + 1·1·0.4 = 0.4

Cov(X, Y) = E[XY] − E[X]E[Y] = 0.4 − (0.7)(0.5) = 0.4 − 0.35 = 0.05

Var(X) = E[X²] − (E[X])². E[X²] = 0²·0.3 + 1²·0.7 = 0.7. Var(X) = 0.7 − 0.49 = 0.21. Var(Y) = E[Y²] − (E[Y])² = 0.5 − 0.25 = 0.25.

ρ = 0.05 / √(0.21 · 0.25) = 0.05 / √0.0525 = 0.05 / 0.229 ≈ 0.218

Moderate positive linear relationship.


Quiz

Q1: Linearity of expectation E[aX + bY] = aE[X] + bE[Y] holds:

A) Only when X and Y are independent B) Only when X and Y are identically distributed C) For any random variables X and Y, regardless of dependence D) Only for discrete random variables

Correct: C)


Q2: E[X] for a Bernoulli(p) random variable equals:

A) p(1−p) B) p C) 1/p D) √(p(1−p))

Correct: B)


Q3: The Law of the Unconscious Statistician (LOTUS) states that E[g(X)] equals:

A) g(E[X]) B) Σ g(x) p_X(x) for discrete X C) g(Σ x p_X(x)) D) E[X] · E[g(X)]

Correct: B)


Q4: If X ~ Binomial(n, p), then E[X] equals:

A) np B) n/p C) p/n D) np²

Correct: A)


Q5: Var(aX + b) for constants a and b equals:

A) a·Var(X) + b B) a²·Var(X) C) a·Var(X) D) Var(X) + b²

Correct: B)


Q6: For which discrete distribution does E[X] = 1/p?

A) Binomial(n, p) B) Poisson(p) C) Geometric(p) D) Bernoulli(p)

Correct: C)


Q7: Covariance Cov(X, Y) measures:

A) The probability that X and Y are independent B) The strength of linear dependence between X and Y C) Whether X is always larger than Y D) The ratio of E[X] to E[Y]

Correct: B)


Practice Problems

  1. Derive E[X] for X ~ Bernoulli(p) directly from the definition. Then derive Var(X).

  2. Let X ~ NegBin(2, 0.3). Find E[X] directly from the formula r/p.

  3. A random variable X has PMF: p(1) = 0.2, p(2) = 0.3, p(3) = 0.5. Find E[X], E[X²], Var(X), and E[1/X].

  4. Using indicator variables, find the expected number of heads when 10 fair coins are flipped. Then find the variance.

  5. Prove that Var(aX + b) = a² Var(X) for constants a, b.

  6. If Var(X) = 3, Var(Y) = 5, and Cov(X, Y) = 2, find Var(2X − 3Y).

  7. Show that if P(X = c) = 1, then E[X] = c and Var(X) = 0. Is the converse true?

Answers 1. E[X] = 0·(1−p) + 1·p = p. E[X²] = 0²·(1−p) + 1²·p = p. Var(X) = E[X²] − (E[X])² = p − p² = p(1−p). 2. E[X] = r/p = 2/0.3 ≈ 6.667. On average, it takes about 6.67 trials to get 2 successes. 3. E[X] = 1(0.2) + 2(0.3) + 3(0.5) = 0.2 + 0.6 + 1.5 = 2.3. E[X²] = 1(0.2) + 4(0.3) + 9(0.5) = 0.2 + 1.2 + 4.5 = 5.9. Var(X) = 5.9 − (2.3)² = 5.9 − 5.29 = 0.61. E[1/X] = 1(0.2) + (1/2)(0.3) + (1/3)(0.5) = 0.2 + 0.15 + 0.1667 = 0.5167. 4. Let Iᵢ = indicator of heads on flip i. E[Iᵢ] = 0.5. By linearity, E[X] = 10·0.5 = 5. Since flips are independent, Var(X) = Σ Var(Iᵢ) = 10·(0.5·0.5) = 10·0.25 = 2.5. 5. Var(aX+b) = E[(aX+b − E[aX+b])²] = E[(aX+b − aE[X]−b)²] = E[a²(X−E[X])²] = a² E[(X−E[X])²] = a² Var(X). 6. Var(2X−3Y) = 4Var(X) + 9Var(Y) − 12Cov(X,Y) = 4·3 + 9·5 − 12·2 = 12 + 45 − 24 = 33. 7. If P(X=c)=1, then E[X] = c·1 = c. E[X²] = c², so Var(X) = c² − c² = 0. Yes, the converse is true: if Var(X) = 0, then E[(X−μ)²] = 0. Since (X−μ)² ≥ 0 almost surely, the only way its expectation is zero is if P(X=μ) = 1.

Summary


Pitfalls


Quiz

  1. Linearity of expectation E[X + Y] = E[X] + E[Y] holds: a) Only if X and Y are independent b) Only if X and Y have the same distribution c) Always (provided expectations exist) d) Only for continuous random variables Answer: c. Linearity is unconditional — it follows from the definition of expectation and the distributive property of sums.

  2. If X ~ Binomial(100, 0.2), E[X] is: a) 20 b) 80 c) 5 d) 50 Answer: a. E[X] = np = 100·0.2 = 20.

  3. The Law of the Unconscious Statistician (LOTUS) states: a) E[g(X)] = g(E[X]) b) E[g(X)] = Σ g(x) p_X(x) c) E[g(X)] = ∫ g(x) dx d) E[g(X)] = E[X] · E[g(X)] Answer: b. You compute the expectation of g(X) by summing g(x) weighted by the PMF of X.

  4. Var(X) = 0 implies: a) X has a symmetric distribution b) X is constant with probability 1 c) E[X] = 0 d) X is discrete Answer: b. Zero variance means no variability; the random variable is degenerate (constant almost surely).

  5. If Cov(X, Y) = 0, which must be true? a) X and Y are independent b) E[XY] = E[X]E[Y] c) Var(X + Y) > Var(X) + Var(Y) d) ρ = 1 Answer: b. Cov(X,Y) = E[XY] − E[X]E[Y] = 0 ⇔ E[XY] = E[X]E[Y]. Independence is sufficient but not necessary for zero covariance.

  6. The variance of a sum Var(X + Y) equals: a) Var(X) + Var(Y) b) Var(X) + Var(Y) + Cov(X, Y) c) Var(X) + Var(Y) + 2Cov(X, Y) d) Var(X)Var(Y) + 2Cov(X, Y) Answer: c. Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y). When independent, the covariance term drops.

  7. For X ~ Geometric(0.25), E[X] = ? a) 2 b) 4 c) 0.25 d) 8 Answer: b. E[X] = 1/p = 1/0.25 = 4.

  8. Which is always true about the correlation coefficient ρ? a) ρ > 0 b) −1 ≤ ρ ≤ 1 c) If ρ = 0, X and Y are independent d) ρ = Cov(X,Y) · Var(X) · Var(Y) Answer: b. By Cauchy-Schwarz, |Cov(X,Y)| ≤ σ_X σ_Y, so −1 ≤ ρ ≤ 1.


Next Steps

Continue to 10-07 Continuous Random Variables to learn about PDFs, CDFs, the uniform distribution, and the exponential distribution in the continuous setting.