Math graphic
šŸ“ Concept diagram

Phase 10: Probability Theory

Subject 10-04: Discrete Random Variables

Prerequisites: 10-01 (Probability Foundations), 10-02 (Conditional Probability), 10-03 (Independence), basic combinatorics (binomial coefficients)


Learning Objectives

  1. Define a discrete random variable, its probability mass function (PMF), and cumulative distribution function (CDF)
  2. Compute and interpret PMF and CDF for Bernoulli, binomial, geometric, negative binomial, and hypergeometric distributions
  3. Derive the binomial distribution from Bernoulli trials and the geometric from waiting times
  4. Relate the hypergeometric distribution to sampling without replacement and show its convergence to the binomial
  5. Use the CDF to compute probabilities of interval events: P(a < X ≤ b) = F(b) āˆ’ F(a)

Core Content

1. Random Variables and PMFs

A random variable X is a function X: Ī© → R that assigns a real number to each outcome. A random variable is discrete if its range (set of possible values) is finite or countably infinite.

The probability mass function (PMF) of a discrete random variable X is:

p_X(x) = P(X = x)    for x ∈ range(X)

Properties of a PMF: 1. p_X(x) ≄ 0 for all x 2. Ī£_x p_X(x) = 1 (sum over all values in the range) 3. For any set B āŠ† R: P(X ∈ B) = Ī£_{x ∈ B} p_X(x)

The cumulative distribution function (CDF) is:

$F_X(x) = P(X ≤ x) = Ī£_{t ≤ x} p_X(t)
$

Properties of a CDF: - Non-decreasing: if a < b, then F(a) ≤ F(b) - Right-continuous: lim_{h→0⁺} F(x + h) = F(x) - lim_{xā†’āˆ’āˆž} F(x) = 0, lim_{xā†’āˆž} F(x) = 1

For discrete RVs, the CDF is a step function with jumps at the support points.

Recovering PMF from CDF: p_X(x) = F_X(x) āˆ’ F_X(x⁻) (the jump at x).

2. Bernoulli Distribution

A single trial with two outcomes: "success" (1) with probability p, "failure" (0) with probability 1āˆ’p.

Notation: X ~ Bernoulli(p)

PMF: p_X(1) = p, p_X(0) = 1 āˆ’ p, zero elsewhere.

CDF:

$F_X(x) = { 0,          x < 0
         { 1 āˆ’ p,      0 ≤ x < 1
         { 1,          x ≄ 1
$

Applications: Any binary experiment — coin flip, pass/fail test, yes/no survey response.

3. Binomial Distribution

n independent Bernoulli trials, each with success probability p. Let X = number of successes.

Notation: X ~ Binomial(n, p)

PMF:

$P(X = k) = C(n, k) pįµ (1āˆ’p)^{nāˆ’k},    k = 0, 1, ..., n
$

Derivation: There are C(n, k) ways to choose which k trials are successes. Each specific sequence of k successes and nāˆ’k failures has probability pįµ(1āˆ’p)^{nāˆ’k}. Since these sequences are disjoint, sum to get the PMF.

CDF: F(k) = Ī£_{i=0}^{k} C(n, i) pⁱ (1āˆ’p)^{nāˆ’i} (no simple closed form).

Special case: Binomial(1, p) = Bernoulli(p)

Shape: Symmetric when p = 0.5; skewed right when p < 0.5; skewed left when p > 0.5.

Recursion for computation:

$P(X = k+1) = P(X = k) Ā· (nāˆ’k)/(k+1) Ā· p/(1āˆ’p)
$

4. Geometric Distribution

Independent Bernoulli trials until the first success. Let X = number of trials needed.

Notation: X ~ Geometric(p)

PMF:

$P(X = k) = (1āˆ’p)^{kāˆ’1} p,    k = 1, 2, 3, ...
$

Derivation: To need exactly k trials, the first kāˆ’1 must be failures (probability (1āˆ’p)^{kāˆ’1}) and the k-th must be a success (probability p). By independence, multiply.

CDF:

$F(k) = P(X ≤ k) = 1 āˆ’ P(X > k) = 1 āˆ’ P(first k all failures) = 1 āˆ’ (1āˆ’p)įµ
$

Memoryless property (discrete): P(X > m + n | X > m) = P(X > n). Given you've already waited m trials without success, the additional waiting time follows the same geometric distribution as starting fresh. This is the ONLY discrete distribution with this property.

Alternative definition: Some texts define Y = number of failures before first success. Then Y = X āˆ’ 1, and P(Y = k) = (1āˆ’p)įµp for k = 0, 1, 2, ...

5. Negative Binomial Distribution

Independent Bernoulli trials until r successes are observed. Let X = number of trials needed.

Notation: X ~ NegBin(r, p)

PMF:

$P(X = k) = C(kāˆ’1, rāˆ’1) pʳ (1āˆ’p)^{kāˆ’r},    k = r, r+1, r+2, ...
$

Derivation: To succeed on trial k, we need exactly rāˆ’1 successes in the first kāˆ’1 trials (C(kāˆ’1, rāˆ’1) ways, each with probability p^{rāˆ’1}(1āˆ’p)^{(kāˆ’1)āˆ’(rāˆ’1)}) and then success on trial k (probability p). Product: C(kāˆ’1, rāˆ’1) pʳ (1āˆ’p)^{kāˆ’r}.

Special cases: - NegBin(1, p) = Geometric(p) - NegBin(r, p) is the sum of r independent Geometric(p) random variables

Alternative parameterization: Some texts define Y = number of failures before r successes. Then Y = X āˆ’ r, and P(Y = k) = C(k+rāˆ’1, rāˆ’1) pʳ (1āˆ’p)įµ for k = 0, 1, 2, ...

6. Hypergeometric Distribution

Drawing n items without replacement from a population of N items containing K "successes."

Notation: X ~ Hypergeometric(N, K, n)

PMF:

$P(X = k) = C(K, k) Ā· C(Nāˆ’K, nāˆ’k) / C(N, n),    max(0, nāˆ’(Nāˆ’K)) ≤ k ≤ min(n, K)
$

Derivation: There are C(N, n) equally likely ways to choose n items. Among these, C(K, k) ways to choose k successes and C(Nāˆ’K, nāˆ’k) ways to choose nāˆ’k failures. Multiply and divide.

Key difference from binomial: The hypergeometric has dependent draws (sampling without replacement).

Convergence to binomial: As N → āˆž with K/N = p fixed, Hypergeometric(N, K, n) → Binomial(n, p). When N is large relative to n, the distinction is negligible (sampling with vs. without replacement nearly identical).



Key Terms

Worked Examples

Example 1: Binomial Computation

A fair coin is flipped 10 times. Find: (a) P(exactly 6 heads), (b) P(at least 8 heads), (c) P(3 to 7 heads inclusive).

Solution:

X ~ Binomial(10, 0.5)

(a) P(X=6) = C(10,6)(0.5)¹⁰ = 210/1024 = 105/512 ā‰ˆ 0.2051

(b) P(X≄8) = P(X=8) + P(X=9) + P(X=10) = [C(10,8) + C(10,9) + C(10,10)]/1024 = (45 + 10 + 1)/1024 = 56/1024 = 7/128 ā‰ˆ 0.0547

(c) P(3≤X≤7) = 1 āˆ’ P(X≤2) āˆ’ P(X≄8) = 1 āˆ’ [C(10,0)+C(10,1)+C(10,2)]/1024 āˆ’ 56/1024 = 1 āˆ’ (1+10+45+56)/1024 = 1 āˆ’ 112/1024 = 912/1024 = 57/64 ā‰ˆ 0.8906


Example 2: Geometric Waiting Time

A basketball player has a 70% free-throw success rate. Shots are independent. What is the probability: (a) her first miss occurs on the 4th shot? (b) she makes at least her first 5 shots?

Solution:

Let X = number of shots until first MISS. This is Geometric with p = P(miss) = 0.30.

(a) P(X=4) = (0.7)³(0.3) = 0.343 Ɨ 0.3 = 0.1029

(b) P(makes first 5) = P(X > 5) = (1āˆ’p)⁵ considering "success" as miss. = (0.7)⁵ = 0.16807.

Alternatively, P(X > 5) = 1 āˆ’ F(5) = (1āˆ’0.3)⁵ = 0.7⁵.


Example 3: Hypergeometric Card Problem

From a standard 52-card deck, 5 cards are dealt. Let X = number of aces. Find P(X = 2).

Solution:

X ~ Hypergeometric(N=52, K=4, n=5). We want 2 aces, 3 non-aces from 48 non-aces:

$P(X=2) = C(4,2) Ā· C(48,3) / C(52,5)
       = [6 Ā· 17296] / 2598960
       = 103776 / 2598960 ā‰ˆ 0.03993
$

Compare with binomial approximation: n=5, p=4/52ā‰ˆ0.0769. P(X=2) ā‰ˆ C(5,2)(0.0769)²(0.9231)³ ā‰ˆ 10Ā·0.00591Ā·0.786 ā‰ˆ 0.0465. The hypergeometric is slightly lower because draws without replacement deplete the aces.


Quiz

Q1: Which of the following is NOT a required property of a probability mass function (PMF)?

A) p_X(x) ≄ 0 for all x B) Ī£ p_X(x) = 1 C) p_X(x) ≤ 1 for all x D) p_X(x) is a continuous function

Correct: D)


Q2: The cumulative distribution function (CDF) of a discrete random variable is best described as:

A) A smooth continuous curve B) A step function that jumps at each point in the support C) The same as the PMF D) Always equal to 1

Correct: B)


Q3: The memoryless property of the geometric distribution means:

A) The distribution has no memory of its parameter B) P(X > m + n | X > m) = P(X > n) C) The expected value is constant regardless of p D) Each trial depends on previous ones

Correct: B)


Q4: X ~ Negative Binomial(r, p) counts the number of trials until r successes. Its minimum possible value is:

A) 0 B) 1 C) r D) r + 1

Correct: C)


Q5: The hypergeometric distribution differs from the binomial primarily because:

A) It uses a different PMF formula B) It models sampling WITH replacement C) It models sampling WITHOUT replacement, making draws dependent D) It only applies to card problems

Correct: C)


Q6: As N → āˆž with K/N = p fixed, the Hypergeometric(N, K, n) distribution converges to:

A) Poisson(np) B) Binomial(n, p) C) Geometric(p) D) Uniform

Correct: B)


Q7: For X ~ Binomial(10, 0.5), the shape of the PMF is:

A) Skewed right B) Skewed left C) Symmetric D) Bimodal

Correct: C)


Practice Problems

  1. Verify that the binomial PMF sums to 1: Ī£_{k=0}^{n} C(n,k) pįµ (1āˆ’p)^{nāˆ’k} = 1.

  2. Let X ~ Binomial(5, 0.4). Compute the full PMF table. Verify that Σ p(k) = 1.

  3. A machine produces defective items with probability 0.05. Items are independently defective. In a batch of 20, find P(no defectives), P(exactly 2 defectives), and P(at most 1 defective).

  4. Derive the geometric CDF: show P(X > k) = (1āˆ’p)įµ and hence F(k) = 1 āˆ’ (1āˆ’p)įµ.

  5. Let X ~ Geometric(0.2). Find P(X > 5 | X > 3) and verify the memoryless property.

  6. From a set of 10 transistors (7 good, 3 defective), 4 are selected without replacement. Let X = number of good ones. Find the PMF.

  7. A student takes a 12-question multiple-choice test, each with 4 options, guessing randomly. Assuming independence, what is P(pass with ≄ 50%)?

Answers 1. By the binomial theorem: Ī£ C(n,k) pįµ (1āˆ’p)^{nāˆ’k} = (p + (1āˆ’p))ⁿ = 1ⁿ = 1. 2. P(0)=C(5,0)(0.4)⁰(0.6)⁵=0.07776; P(1)=5(0.4)(0.6)⁓=0.2592; P(2)=10(0.16)(0.216)=0.3456; P(3)=10(0.064)(0.36)=0.2304; P(4)=5(0.0256)(0.6)=0.0768; P(5)=(0.4)⁵=0.01024. Sum = 1.0. 3. X ~ Bin(20, 0.05). P(0)=(0.95)Ā²ā°ā‰ˆ0.3585. P(2)=C(20,2)(0.05)²(0.95)¹⁸=190Ā·0.0025Ā·0.397ā‰ˆ0.1887. P(≤1)=0.3585+20(0.05)(0.95)Ā¹ā¹ā‰ˆ0.3585+0.3774=0.7358. 4. P(X>k) = P(first k are failures) = (1āˆ’p)įµ. Therefore F(k)=P(X≤k)=1āˆ’P(X>k)=1āˆ’(1āˆ’p)įµ. 5. P(X>5|X>3) = P(X>5)/P(X>3) = (0.8)⁵/(0.8)³ = (0.8)² = 0.64. Memoryless property: P(X>k+m|X>m)=P(X>k). Here k=2, m=3: P(X>5|X>3)=P(X>2)=(0.8)²=0.64. āœ“ 6. X ~ Hypergeometric(N=10, K=7, n=4). Range: k=1 to 4 (can't get 0 good because only 3 bad). P(1)=C(7,1)C(3,3)/C(10,4)=7Ā·1/210=1/30; P(2)=C(7,2)C(3,2)/210=21Ā·3/210=63/210=0.3; P(3)=C(7,3)C(3,1)/210=35Ā·3/210=105/210=0.5; P(4)=C(7,4)C(3,0)/210=35Ā·1/210=1/6. Sum = 1/30+63/210+105/210+35/210 = 7/210+63/210+105/210+35/210 = 210/210 = 1. āœ“ 7. X ~ Bin(12, 0.25). Need P(X≄6). P(X≄6) = Ī£_{k=6}^{12} C(12,k)(0.25)įµ(0.75)^{12āˆ’k} ā‰ˆ 0.0544. About 5.4%.

Summary


Pitfalls


Quiz

  1. Which property does NOT hold for every PMF? a) p(x) ≄ 0 for all x b) Ī£ p(x) = 1 c) p(x) ≤ 1 for all x d) p(x) is continuous Answer: d. PMFs are defined on discrete sets and are not continuous functions.

  2. If X ~ Binomial(10, 0.5), what is the shape of its PMF? a) Skewed right b) Skewed left c) Symmetric d) Uniform Answer: c. When p = 0.5, the binomial PMF is symmetric because C(n,k) = C(n, nāˆ’k).

  3. The memoryless property of the geometric distribution means: a) X has no effect on future trials b) P(X > m+n | X > m) = P(X > n) c) The expected value is constant d) X is independent of n Answer: b. Given you've already waited m trials, the additional waiting time follows the same geometric distribution.

  4. The range of a NegBin(r, p) random variable (counting trials, not failures) is: a) {0, 1, 2, ...} b) {1, 2, ..., r} c) {r, r+1, r+2, ...} d) {0, 1, ..., āˆž} Answer: c. You need at least r trials to get r successes.

  5. In the hypergeometric PMF, the denominator C(N, n) represents: a) The number of ways to draw successes b) The total number of equally likely samples of size n c) The number of failures in the population d) The binomial coefficient from the limit Answer: b. It's the total number of ways to choose n items from N without replacement.

  6. The CDF of a discrete RV at a point x is: a) Always equal to the PMF at x b) The sum of PMF values for all t c) The sum of PMF values for all t ≤ x d) The derivative of the PMF Answer: c. F_X(x) = Ī£_{t ≤ x} p_X(t) by definition.

  7. For X ~ Binomial(n, p), the mode (most likely value) is approximately: a) n/2 b) np c) ⌊(n+1)pāŒ‹ or ⌈(n+1)pāŒ‰ āˆ’ 1 d) n(1āˆ’p) Answer: c. The mode of the binomial is at or near ⌊(n+1)pāŒ‹.

  8. If you draw 2 cards without replacement from a 52-card deck, the probability both are aces is (4/52)(3/51). This uses which distribution? a) Binomial b) Geometric c) Hypergeometric d) Bernoulli Answer: c. Drawing without replacement from a finite population — this is hypergeometric.


Next Steps

Continue to 10-05 Poisson Process and Distribution to learn about the Poisson as the limit of the binomial, the Poisson process, memorylessness, and exponential inter-arrival times.