Math graphic
📐 Concept diagram

Phase 10: Probability Theory

Subject 10-10: Joint Distributions

Prerequisites: 10-04 (Discrete RVs), 10-07 (Continuous RVs), 10-06 (Expectation), multivariable calculus (partial derivatives, double integrals)


Learning Objectives

  1. Define and compute joint PMFs for discrete random vectors and joint PDFs for continuous random vectors
  2. Derive marginal distributions by summing (discrete) or integrating (continuous) over other variables
  3. Define conditional distributions and compute conditional expectations E[Y | X = x]
  4. State and verify independence for jointly distributed random variables: f(x, y) = f_X(x) f_Y(y)
  5. Apply the bivariate normal distribution and compute probabilities for linear combinations

Core Content

1. Joint PMF (Discrete Case)

For discrete random variables X and Y, the joint probability mass function is:

$p_{X,Y}(x, y) = P(X = x, Y = y)
$

Properties: 1. p_{X,Y}(x, y) ≥ 0 for all (x, y) 2. Σ_x Σ_y p_{X,Y}(x, y) = 1 3. For any event A ⊆ R²: P((X,Y) ∈ A) = Σ_{(x,y)∈A} p_{X,Y}(x, y)

Joint CDF:

$F_{X,Y}(x, y) = P(X ≤ x, Y ≤ y) = Σ_{u≤x} Σ_{v≤y} p_{X,Y}(u, v)
$

2. Joint PDF (Continuous Case)

For continuous random variables X and Y, the joint probability density function f_{X,Y}(x, y) satisfies:

Properties: 1. f_{X,Y}(x, y) ≥ 0 for all (x, y) 2. ∫∫ f_{X,Y}(x, y) dx dy = 1 3. P((X,Y) ∈ A) = ∬A f{X,Y}(x, y) dx dy

Joint CDF:

$F_{X,Y}(x, y) = P(X ≤ x, Y ≤ y) = ∫_{−∞}^{x} ∫_{−∞}^{y} f_{X,Y}(u, v) dv du
$

Recovering PDF from CDF:

$f_{X,Y}(x, y) = ∂²F_{X,Y} / (∂x ∂y)
$

3. Marginal Distributions

Discrete — Marginal PMF:

$p_X(x) = Σ_y p_{X,Y}(x, y)    (sum over all values of Y)
p_Y(y) = Σ_x p_{X,Y}(x, y)    (sum over all values of X)
$

Continuous — Marginal PDF:

$f_X(x) = ∫_{−∞}^{∞} f_{X,Y}(x, y) dy    (integrate out Y)
f_Y(y) = ∫_{−∞}^{∞} f_{X,Y}(x, y) dx    (integrate out X)
$

Intuition: The marginal distribution of X is what you get if you "average over" or "ignore" Y — it's the distribution of X alone, without reference to Y.

Example (continuous):

f_{X,Y}(x, y) = 6xy²    for 0 < x < 1, 0 < y < 1

Marginal of X:

$f_X(x) = ∫₀¹ 6xy² dy = 6x[y³/3]₀¹ = 6x · 1/3 = 2x,    0 < x < 1
$

Marginal of Y:

$f_Y(y) = ∫₀¹ 6xy² dx = 6y²[x²/2]₀¹ = 6y² · 1/2 = 3y²,    0 < y < 1
$

Verify normalization: ∫₀¹ 2x dx = 1 ✓, ∫₀¹ 3y² dy = 1 ✓.

4. Conditional Distributions

Discrete:

$p_{Y|X}(y|x) = P(Y = y | X = x) = p_{X,Y}(x, y) / p_X(x)
$

Continuous:

f_{Y|X}(y|x) = f_{X,Y}(x, y) / f_X(x)    for f_X(x) > 0

For fixed x, f_{Y|X}(y|x) as a function of y is a valid PDF: it's non-negative and integrates to 1.

Conditional Expectation:

Discrete:

$E[Y | X = x] = Σ_y y · p_{Y|X}(y|x)
$

Continuous:

$E[Y | X = x] = ∫ y · f_{Y|X}(y|x) dy
$

Law of Total Expectation (Law of Iterated Expectations):

$E[Y] = E[E[Y | X]]
$

In the continuous case:

$E[Y] = ∫ E[Y | X = x] f_X(x) dx
$

This is a powerful tool: you can compute E[Y] by first conditioning on X, finding the conditional expectation, then averaging over X.

Law of Total Variance:

$Var(Y) = E[Var(Y | X)] + Var(E[Y | X])
$

5. Independence

X and Y are independent if and only if:

Discrete:

p_{X,Y}(x, y) = p_X(x) · p_Y(y)    for all x, y

Continuous:

f_{X,Y}(x, y) = f_X(x) · f_Y(y)    for all x, y

Equivalently: the conditional distribution equals the marginal:

$f_{Y|X}(y|x) = f_Y(y)    (X gives no information about Y)
$

And: the joint CDF factors: F_{X,Y}(x, y) = F_X(x) F_Y(y).

Theorem: If X and Y are independent, then Cov(X, Y) = 0 (and E[XY] = E[X]E[Y]). The converse is FALSE — zero correlation does NOT imply independence (except for jointly normal variables).

Theorem: If X and Y are independent, then g(X) and h(Y) are independent for any functions g, h.

6. Bivariate Normal Distribution

The most important joint continuous distribution. (X, Y) is bivariate normal with parameters μ_X, μ_Y, σ_X², σ_Y², and correlation ρ.

Joint PDF:

$f(x, y) = (1 / (2π σ_X σ_Y √(1−ρ²))) · exp(−(1/(2(1−ρ²))) [((x−μ_X)/σ_X)² − 2ρ((x−μ_X)/σ_X)((y−μ_Y)/σ_Y) + ((y−μ_Y)/σ_Y)²])
$

Key properties: - Marginals: X ~ N(μ_X, σ_X²), Y ~ N(μ_Y, σ_Y²) - Conditional distributions: Y | X=x ~ N(μ_Y + ρ(σ_Y/σ_X)(x−μ_X), σ_Y²(1−ρ²)) - ρ = 0 ⇔ X and Y are independent (unique among distributions — for bivariate normal ONLY, zero correlation implies independence) - Linear combinations are normal: aX + bY ~ N(aμ_X + bμ_Y, a²σ_X² + b²σ_Y² + 2abρσ_Xσ_Y)

Conditional expectation given X is LINEAR in X:

$E[Y | X = x] = μ_Y + ρ(σ_Y/σ_X)(x − μ_X)
$

This is the "regression line" — it gives the best linear predictor of Y given X.



Key Terms

Worked Examples

Example 1: Discrete Joint Distribution

Joint PMF of (X, Y):

X\Y 1 2 3
0 0.1 0.1 0.0
1 0.2 0.3 0.3

Find: (a) marginal PMFs, (b) P(Y=2 | X=1), (c) E[Y | X=0], (d) Are X and Y independent?

Solution:

(a) p_X(0) = 0.1+0.1+0.0 = 0.2; p_X(1) = 0.2+0.3+0.3 = 0.8. p_Y(1) = 0.1+0.2 = 0.3; p_Y(2) = 0.1+0.3 = 0.4; p_Y(3) = 0.0+0.3 = 0.3.

(b) p_{Y|X}(2|1) = p(1,2)/p_X(1) = 0.3/0.8 = 0.375.

(c) Conditional PMF given X=0: p(1|0)=0.1/0.2=0.5, p(2|0)=0.5, p(3|0)=0. E[Y|X=0] = 1(0.5)+2(0.5)+3(0) = 1.5.

(d) Check: p(0,1) = 0.1, p_X(0)p_Y(1) = 0.2·0.3 = 0.06. Not equal, so dependent.


Example 2: Continuous Joint Distribution

Let f(x, y) = 2 for 0 < x < y < 1, zero otherwise.

(a) Verify it's a valid joint PDF. (b) Find marginal PDFs f_X(x) and f_Y(y). (c) Find P(X + Y < 1). (d) Find E[Y | X = 0.5].

Solution:

(a) ∫₀¹ ∫ₓ¹ 2 dy dx = ∫₀¹ 2(1−x) dx = 2[x−x²/2]₀¹ = 2(1−1/2) = 1 ✓. Non-negative on support ✓.

(b) f_X(x) = ∫ₓ¹ 2 dy = 2(1−x) for 0 < x < 1. f_Y(y) = ∫₀ʸ 2 dx = 2y for 0 < y < 1.

(c) P(X+Y < 1) = region where 0 < x < y, x+y < 1. For fixed x, y goes from x to 1−x, but only valid when x < 1−x, i.e., x < 0.5. So: P = ∫₀^{0.5} ∫ₓ^{1−x} 2 dy dx = ∫₀^{0.5} 2(1−2x) dx = 2[x−x²]₀^{0.5} = 2(0.5−0.25) = 0.5.

(d) f_{Y|X}(y|0.5) = f(0.5,y)/f_X(0.5) = 2/(2(1−0.5)) = 2/1 = 2 for 0.5 < y < 1. E[Y|X=0.5] = ∫{0.5}¹ y·2 dy = [y²]{0.5}¹ = 1 − 0.25 = 0.75.


Example 3: Bivariate Normal

Let (X, Y) be bivariate normal with μ_X = 170, μ_Y = 65, σ_X = 10, σ_Y = 8, ρ = 0.6.

(a) Find P(Y > 70). (b) Find the conditional distribution of Y given X = 180. (c) Find E[2X − 3Y] and Var(2X − 3Y).

Solution:

(a) Y ~ N(65, 64). Z = (70−65)/8 = 5/8 = 0.625. P(Y > 70) = 1 − Φ(0.625) ≈ 0.266.

(b) Y|X=180 ~ N(μ_{Y|X}, σ²_{Y|X}). μ_{Y|X} = 65 + 0.6(8/10)(180−170) = 65 + 0.6·0.8·10 = 65 + 4.8 = 69.8. σ²_{Y|X} = 64(1−0.36) = 64·0.64 = 40.96 (so σ_{Y|X} ≈ 6.4).

(c) E[2X−3Y] = 2·170 − 3·65 = 340 − 195 = 145. Var(2X−3Y) = 4·100 + 9·64 + 2·2·(−3)·0.6·10·8 = 400 + 576 − 576 = 400. So 2X−3Y ~ N(145, 400).


Quiz

Q1: For jointly distributed discrete random variables, the marginal PMF p_X(x) is obtained by:

A) Multiplying p_{X,Y}(x,y) by p_Y(y) B) Summing p_{X,Y}(x,y) over all values of y C) Integrating p_{X,Y}(x,y) over y D) Taking the derivative of the joint CDF

Correct: B)


Q2: For continuous X and Y to be independent, which condition must hold?

A) f_{X,Y}(x,y) = f_X(x) + f_Y(y) B) F_{X,Y}(x,y) = F_X(x) F_Y(y) for all x,y C) E[XY] = E[X]E[Y] D) Cov(X,Y) = 0

Correct: B)


Q3: To find a marginal PDF from a joint PDF f_{X,Y}(x,y), you:

A) Integrate over x B) Integrate over y C) Set the other variable to its mean D) Take the partial derivative

Correct: B (for f_X)


Q5: For the bivariate normal distribution, zero correlation implies:

A) Nothing about independence B) Independence C) Identical distributions D) The variables are uncorrelated but dependent

Correct: B)


Practice Problems

  1. For the joint PMF: p(0,0)=0.4, p(0,1)=0.1, p(1,0)=0.2, p(1,1)=0.3. Find marginals, check independence, and find E[XY].

  2. Let f(x,y) = (3/2)(x² + y²) for 0 < x < 1, 0 < y < 1. Verify it's valid, find marginals, and compute P(X < 0.5).

  3. For f(x,y) = 4xy for 0 < x < 1, 0 < y < 1: find f_{Y|X}(y|x) and E[Y | X = x].

  4. If (X, Y) is bivariate normal with μ_X=0, μ_Y=0, σ_X=1, σ_Y=1, ρ=0.5, find P(Y > 1 | X = 0.5).

  5. Prove the law of total expectation for discrete RVs: E[Y] = Σ_x E[Y | X=x] p_X(x).

  6. Show that if X and Y are independent, then f_{Y|X}(y|x) = f_Y(y).

  7. Let f(x, y) = e^{−(x+y)} for x > 0, y > 0. Find P(X < Y) and show X and Y are independent.

Answers 1. p_X(0)=0.5, p_X(1)=0.5; p_Y(0)=0.6, p_Y(1)=0.4. Check: p(0,0)=0.4, p_X(0)p_Y(0)=0.3 — not equal, so dependent. E[XY]=0·0·0.4+0·1·0.1+1·0·0.2+1·1·0.3=0.3. 2. ∫₀¹∫₀¹(3/2)(x²+y²)dxdy = (3/2)[(1/3)+(1/3)]=(3/2)(2/3)=1. f_X(x)=∫₀¹(3/2)(x²+y²)dy=(3/2)(x²+1/3). P(X<0.5)=∫₀^{0.5}(3/2)(x²+1/3)dx=(3/2)[x³/3+x/3]₀^{0.5}=(3/2)[0.0417+0.1667]=0.3125. 3. f_X(x) = ∫₀¹ 4xy dy = 4x/2 = 2x. f_{Y|X}(y|x) = 4xy/(2x) = 2y for 01|X=0.5) = 1−Φ(0.866) ≈ 0.193. 5. E[Y] = Σ_y y p_Y(y) = Σ_y y Σ_x p(x,y) = Σ_x Σ_y y p(x,y) = Σ_x Σ_y y p(y|x) p_X(x) = Σ_x p_X(x) Σ_y y p(y|x) = Σ_x E[Y|X=x] p_X(x). ✓ 6. If independent, f(x,y)=f_X(x)f_Y(y). Then f_{Y|X}(y|x)=f(x,y)/f_X(x)=f_X(x)f_Y(y)/f_X(x)=f_Y(y). ✓ 7. f(x,y)=e^{−x}e^{−y}=f_X(x)f_Y(y) where f_X(x)=e^{−x} (Exp(1)) and f_Y(y)=e^{−y}. Independent. P(X --- ### Summary - Joint PMF/PDF describes the simultaneous behavior of multiple random variables; it must be non-negative and sum/integrate to 1 - Marginal distributions are obtained by summing (discrete) or integrating (continuous) over the other variables: f_X(x) = ∫ f(x,y) dy - Conditional distributions are f_{Y|X}(y|x) = f(x,y)/f_X(x); the conditional expectation E[Y|X=x] is the mean of this distribution - Independence means the joint factors: f(x,y) = f_X(x)f_Y(y); it implies zero covariance, but zero covariance does NOT imply independence (except for jointly normal variables) - The bivariate normal distribution has normal marginals, linear conditional expectations, and is the unique case where ρ = 0 ⇔ independence --- ### Pitfalls - **Forgetting to check the full support region when computing marginal densities.** When the joint support is not rectangular (e.g., 0 < x < y < 1), the integration limits depend on the variable being retained. For f_X(x), integrate y from x to 1, not 0 to 1. Using the wrong limits produces a function that doesn't integrate to 1. - **Confusing independence (joint PDF factors) with zero correlation.** X and Y are independent iff f(x,y) = f_X(x)f_Y(y) for ALL (x,y). Zero covariance only guarantees E[XY] = E[X]E[Y] — a much weaker condition. Functions like Y = X² with symmetric X have zero covariance but are perfectly dependent. - **Applying the "bivariate normal implies zero correlation = independence" property to other distributions.** The equivalence ρ = 0 ⇔ independence is a SPECIAL property of the multivariate normal. For other joint distributions (even those with normal marginals!), zero correlation does NOT guarantee independence. - **Miscomputing conditional probabilities from a joint PDF.** The conditional PDF is f_{Y|X}(y|x) = f(x,y)/f_X(x), NOT f(x,y) alone. Forgetting to divide by the marginal f_X(x) means the conditional "PDF" won't integrate to 1. Always compute the marginal first. - **Forgetting that the law of total expectation requires iterated expectations over the conditioning variable.** E[Y] = E[E[Y|X]] means: (1) find g(x) = E[Y|X=x] as a function of x, (2) then compute E[g(X)] using the marginal distribution of X. A common error is to stop at step 1 and treat g(x) as if it were E[Y]. --- ### Quiz 1. To obtain the marginal PDF of X from f_{X,Y}(x, y), you: a) Differentiate with respect to x b) Integrate over y c) Set Y to its mean d) Divide by f_Y(y) **Answer: b.** f_X(x) = ∫ f_{X,Y}(x, y) dy. You "integrate out" Y. 2. If f_{X,Y}(x, y) = f_X(x) f_Y(y) for all (x, y), then: a) X and Y are correlated b) Cov(X, Y) > 0 c) X and Y are independent d) X and Y are identically distributed **Answer: c.** Factorization of the joint PDF into the product of marginals is the definition of independence. 3. For continuous RVs, the conditional expectation E[Y | X = x] is: a) Always linear in x b) ∫ y f_{Y|X}(y|x) dy c) E[Y] regardless of x d) f_Y(y) integrated over x **Answer: b.** It's the expected value computed using the conditional distribution. 4. The law of total expectation states: a) E[XY] = E[X]E[Y] b) E[Y] = E[E[Y | X]] c) E[X+Y] = E[X]+E[Y] d) E[Y] = E[Y | E[X]] **Answer: b.** E[Y] = E[E[Y|X]] — average the conditional expectations. 5. In a bivariate normal distribution, if ρ = 0 then: a) X and Y are dependent b) X and Y are independent c) X and Y have the same mean d) X + Y is not normal **Answer: b.** For bivariate normal ONLY, zero correlation implies independence. This is NOT true for other joint distributions. 6. The conditional distribution Y | X = x for bivariate normal is: a) Cauchy b) Normal c) t-distribution d) Chi-squared **Answer: b.** Y|X=x is normal with mean μ_Y + ρ(σ_Y/σ_X)(x−μ_X) and variance σ_Y²(1−ρ²). 7. If the support of (X, Y) is the unit square [0,1]×[0,1] and f(x, y) = 1, what are the marginals? a) Both N(0.5, 1/12) b) Both Uniform(0, 1) c) Both Exponential(1) d) X ~ Uniform(0,1), Y depends on X **Answer: b.** f_X(x) = ∫₀¹ 1 dy = 1 for 0