Phase 10: Probability Theory
Subject 10-04: Discrete Random Variables
Prerequisites: 10-01 (Probability Foundations), 10-02 (Conditional Probability), 10-03 (Independence), basic combinatorics (binomial coefficients)
Learning Objectives
- Define a discrete random variable, its probability mass function (PMF), and cumulative distribution function (CDF)
- Compute and interpret PMF and CDF for Bernoulli, binomial, geometric, negative binomial, and hypergeometric distributions
- Derive the binomial distribution from Bernoulli trials and the geometric from waiting times
- Relate the hypergeometric distribution to sampling without replacement and show its convergence to the binomial
- Use the CDF to compute probabilities of interval events: P(a < X ⤠b) = F(b) ā F(a)
Core Content
1. Random Variables and PMFs
A random variable X is a function X: Ī© ā R that assigns a real number to each outcome. A random variable is discrete if its range (set of possible values) is finite or countably infinite.
The probability mass function (PMF) of a discrete random variable X is:
p_X(x) = P(X = x) for x ā range(X)
Properties of a PMF: 1. p_X(x) ā„ 0 for all x 2. Ī£_x p_X(x) = 1 (sum over all values in the range) 3. For any set B ā R: P(X ā B) = Ī£_{x ā B} p_X(x)
The cumulative distribution function (CDF) is:
$F_X(x) = P(X ⤠x) = Σ_{t ⤠x} p_X(t)
$
Properties of a CDF: - Non-decreasing: if a < b, then F(a) ⤠F(b) - Right-continuous: lim_{hā0āŗ} F(x + h) = F(x) - lim_{xāāā} F(x) = 0, lim_{xāā} F(x) = 1
For discrete RVs, the CDF is a step function with jumps at the support points.
Recovering PMF from CDF: p_X(x) = F_X(x) ā F_X(xā») (the jump at x).
2. Bernoulli Distribution
A single trial with two outcomes: "success" (1) with probability p, "failure" (0) with probability 1āp.
Notation: X ~ Bernoulli(p)
PMF: p_X(1) = p, p_X(0) = 1 ā p, zero elsewhere.
CDF:
$F_X(x) = { 0, x < 0
{ 1 ā p, 0 ⤠x < 1
{ 1, x ā„ 1
$
Applications: Any binary experiment ā coin flip, pass/fail test, yes/no survey response.
3. Binomial Distribution
n independent Bernoulli trials, each with success probability p. Let X = number of successes.
Notation: X ~ Binomial(n, p)
PMF:
$P(X = k) = C(n, k) pįµ (1āp)^{nāk}, k = 0, 1, ..., n
$
Derivation: There are C(n, k) ways to choose which k trials are successes. Each specific sequence of k successes and nāk failures has probability pįµ(1āp)^{nāk}. Since these sequences are disjoint, sum to get the PMF.
CDF: F(k) = Ī£_{i=0}^{k} C(n, i) pā± (1āp)^{nāi} (no simple closed form).
Special case: Binomial(1, p) = Bernoulli(p)
Shape: Symmetric when p = 0.5; skewed right when p < 0.5; skewed left when p > 0.5.
Recursion for computation:
$P(X = k+1) = P(X = k) Ā· (nāk)/(k+1) Ā· p/(1āp) $
4. Geometric Distribution
Independent Bernoulli trials until the first success. Let X = number of trials needed.
Notation: X ~ Geometric(p)
PMF:
$P(X = k) = (1āp)^{kā1} p, k = 1, 2, 3, ...
$
Derivation: To need exactly k trials, the first kā1 must be failures (probability (1āp)^{kā1}) and the k-th must be a success (probability p). By independence, multiply.
CDF:
$F(k) = P(X ⤠k) = 1 ā P(X > k) = 1 ā P(first k all failures) = 1 ā (1āp)įµ $
Memoryless property (discrete): P(X > m + n | X > m) = P(X > n). Given you've already waited m trials without success, the additional waiting time follows the same geometric distribution as starting fresh. This is the ONLY discrete distribution with this property.
Alternative definition: Some texts define Y = number of failures before first success. Then Y = X ā 1, and P(Y = k) = (1āp)įµp for k = 0, 1, 2, ...
5. Negative Binomial Distribution
Independent Bernoulli trials until r successes are observed. Let X = number of trials needed.
Notation: X ~ NegBin(r, p)
PMF:
$P(X = k) = C(kā1, rā1) pʳ (1āp)^{kār}, k = r, r+1, r+2, ...
$
Derivation: To succeed on trial k, we need exactly rā1 successes in the first kā1 trials (C(kā1, rā1) ways, each with probability p^{rā1}(1āp)^{(kā1)ā(rā1)}) and then success on trial k (probability p). Product: C(kā1, rā1) pʳ (1āp)^{kār}.
Special cases: - NegBin(1, p) = Geometric(p) - NegBin(r, p) is the sum of r independent Geometric(p) random variables
Alternative parameterization: Some texts define Y = number of failures before r successes. Then Y = X ā r, and P(Y = k) = C(k+rā1, rā1) pʳ (1āp)įµ for k = 0, 1, 2, ...
6. Hypergeometric Distribution
Drawing n items without replacement from a population of N items containing K "successes."
Notation: X ~ Hypergeometric(N, K, n)
PMF:
$P(X = k) = C(K, k) Ā· C(NāK, nāk) / C(N, n), max(0, nā(NāK)) ⤠k ⤠min(n, K) $
Derivation: There are C(N, n) equally likely ways to choose n items. Among these, C(K, k) ways to choose k successes and C(NāK, nāk) ways to choose nāk failures. Multiply and divide.
Key difference from binomial: The hypergeometric has dependent draws (sampling without replacement).
Convergence to binomial: As N ā ā with K/N = p fixed, Hypergeometric(N, K, n) ā Binomial(n, p). When N is large relative to n, the distinction is negligible (sampling with vs. without replacement nearly identical).
Key Terms
- 10 04 Discrete Random Variables
- 10-05 Poisson Process and Distribution
- Answer: b.
- Answer: c.
- Answer: d.
- CDF of a discrete RV at a point x
- Subject 10-04: Discrete Random Variables
- cumulative distribution function (CDF)
- discrete
- probability mass function (PMF)
- random variable
Worked Examples
Example 1: Binomial Computation
A fair coin is flipped 10 times. Find: (a) P(exactly 6 heads), (b) P(at least 8 heads), (c) P(3 to 7 heads inclusive).
Solution:
X ~ Binomial(10, 0.5)
(a) P(X=6) = C(10,6)(0.5)¹Ⱐ= 210/1024 = 105/512 ā 0.2051
(b) P(Xā„8) = P(X=8) + P(X=9) + P(X=10) = [C(10,8) + C(10,9) + C(10,10)]/1024 = (45 + 10 + 1)/1024 = 56/1024 = 7/128 ā 0.0547
(c) P(3ā¤Xā¤7) = 1 ā P(Xā¤2) ā P(Xā„8) = 1 ā [C(10,0)+C(10,1)+C(10,2)]/1024 ā 56/1024 = 1 ā (1+10+45+56)/1024 = 1 ā 112/1024 = 912/1024 = 57/64 ā 0.8906
Example 2: Geometric Waiting Time
A basketball player has a 70% free-throw success rate. Shots are independent. What is the probability: (a) her first miss occurs on the 4th shot? (b) she makes at least her first 5 shots?
Solution:
Let X = number of shots until first MISS. This is Geometric with p = P(miss) = 0.30.
(a) P(X=4) = (0.7)³(0.3) = 0.343 à 0.3 = 0.1029
(b) P(makes first 5) = P(X > 5) = (1āp)āµ considering "success" as miss. = (0.7)āµ = 0.16807.
Alternatively, P(X > 5) = 1 ā F(5) = (1ā0.3)āµ = 0.7āµ.
Example 3: Hypergeometric Card Problem
From a standard 52-card deck, 5 cards are dealt. Let X = number of aces. Find P(X = 2).
Solution:
X ~ Hypergeometric(N=52, K=4, n=5). We want 2 aces, 3 non-aces from 48 non-aces:
$P(X=2) = C(4,2) Ā· C(48,3) / C(52,5)
= [6 Ā· 17296] / 2598960
= 103776 / 2598960 ā 0.03993
$
Compare with binomial approximation: n=5, p=4/52ā0.0769. P(X=2) ā C(5,2)(0.0769)²(0.9231)³ ā 10Ā·0.00591Ā·0.786 ā 0.0465. The hypergeometric is slightly lower because draws without replacement deplete the aces.
Quiz
Q1: Which of the following is NOT a required property of a probability mass function (PMF)?
A) p_X(x) ℠0 for all x B) Σ p_X(x) = 1 C) p_X(x) ⤠1 for all x D) p_X(x) is a continuous function
Correct: D)
- If you chose D: Correct! PMFs are defined on discrete sets and are not continuous. They assign probability masses to individual points.
- If you chose A: Non-negativity is a fundamental property of any PMF.
- If you chose B: The probabilities must sum to 1 over the entire support.
- If you chose C: Since individual probabilities are non-negative and sum to 1, each must be ⤠1.
Q2: The cumulative distribution function (CDF) of a discrete random variable is best described as:
A) A smooth continuous curve B) A step function that jumps at each point in the support C) The same as the PMF D) Always equal to 1
Correct: B)
- If you chose B: Correct! For discrete RVs, the CDF F_X(x) = Ī£_{tā¤x} p_X(t) is a right-continuous step function with jumps equal to the PMF values at each support point.
- If you chose A: Smooth CDFs are characteristic of continuous random variables, not discrete ones.
- If you chose C: The CDF accumulates probabilities; the PMF gives individual point masses.
- If you chose D: F_X(x) ā 1 only as x ā ā; for finite x it's between 0 and 1.
Q3: The memoryless property of the geometric distribution means:
A) The distribution has no memory of its parameter B) P(X > m + n | X > m) = P(X > n) C) The expected value is constant regardless of p D) Each trial depends on previous ones
Correct: B)
- If you chose B: Correct! Given you've already waited m trials without success, the additional waiting time follows the same geometric distribution as starting fresh. This uniquely characterizes the geometric among discrete distributions.
- If you chose A: The distribution certainly "remembers" its parameter p, which determines its shape.
- If you chose C: E[X] = 1/p, which depends on p.
- If you chose D: Geometric trials are independent ā the memoryless property is a consequence of this independence.
Q4: X ~ Negative Binomial(r, p) counts the number of trials until r successes. Its minimum possible value is:
A) 0 B) 1 C) r D) r + 1
Correct: C)
- If you chose C: Correct! You need at least r trials to obtain r successes. The fastest possible sequence is r consecutive successes.
- If you chose A: The count of trials starts at 1, not 0 (unlike the alternative parameterization counting failures).
- If you chose B: This is the minimum for Geometric(p) = NegBin(1, p).
- If you chose D: You can achieve exactly r successes in r trials (all successes), so the minimum is r, not r+1.
Q5: The hypergeometric distribution differs from the binomial primarily because:
A) It uses a different PMF formula B) It models sampling WITH replacement C) It models sampling WITHOUT replacement, making draws dependent D) It only applies to card problems
Correct: C)
- If you chose C: Correct! Hypergeometric samples without replacement ā each draw changes the composition of the remaining population, introducing dependence between draws.
- If you chose A: While true, this is a consequence, not the fundamental difference.
- If you chose B: This is backwards ā binomial models sampling WITH replacement (independent draws).
- If you chose D: Hypergeometric applies broadly to any finite-population sampling without replacement.
Q6: As N ā ā with K/N = p fixed, the Hypergeometric(N, K, n) distribution converges to:
A) Poisson(np) B) Binomial(n, p) C) Geometric(p) D) Uniform
Correct: B)
- If you chose B: Correct! When the population is large relative to the sample size, sampling without replacement is approximately the same as sampling with replacement, so hypergeometric ā binomial.
- If you chose A: Binomial converges to Poisson when nāā and pā0 with np fixed ā a different limit.
- If you chose C: Geometric models the number of trials until first success, not counts.
- If you chose D: Hypergeometric is not uniform except in degenerate cases.
Q7: For X ~ Binomial(10, 0.5), the shape of the PMF is:
A) Skewed right B) Skewed left C) Symmetric D) Bimodal
Correct: C)
- If you chose C: Correct! When p = 0.5, P(X=k) = C(10,k)(0.5)¹Ⱐ= C(10,k)/1024. Since C(n,k) = C(n,nāk), the PMF is symmetric about n/2 = 5.
- If you chose A: Right skew occurs when p < 0.5.
- If you chose B: Left skew occurs when p > 0.5.
- If you chose D: The binomial has a single mode (or two adjacent modes when (n+1)p is integer).
Practice Problems
-
Verify that the binomial PMF sums to 1: Ī£_{k=0}^{n} C(n,k) pįµ (1āp)^{nāk} = 1.
-
Let X ~ Binomial(5, 0.4). Compute the full PMF table. Verify that Σ p(k) = 1.
-
A machine produces defective items with probability 0.05. Items are independently defective. In a batch of 20, find P(no defectives), P(exactly 2 defectives), and P(at most 1 defective).
-
Derive the geometric CDF: show P(X > k) = (1āp)įµ and hence F(k) = 1 ā (1āp)įµ.
-
Let X ~ Geometric(0.2). Find P(X > 5 | X > 3) and verify the memoryless property.
-
From a set of 10 transistors (7 good, 3 defective), 4 are selected without replacement. Let X = number of good ones. Find the PMF.
-
A student takes a 12-question multiple-choice test, each with 4 options, guessing randomly. Assuming independence, what is P(pass with ā„ 50%)?
Answers
1. By the binomial theorem: Ī£ C(n,k) pįµ (1āp)^{nāk} = (p + (1āp))āæ = 1āæ = 1. 2. P(0)=C(5,0)(0.4)ā°(0.6)āµ=0.07776; P(1)=5(0.4)(0.6)ā“=0.2592; P(2)=10(0.16)(0.216)=0.3456; P(3)=10(0.064)(0.36)=0.2304; P(4)=5(0.0256)(0.6)=0.0768; P(5)=(0.4)āµ=0.01024. Sum = 1.0. 3. X ~ Bin(20, 0.05). P(0)=(0.95)²ā°ā0.3585. P(2)=C(20,2)(0.05)²(0.95)¹āø=190Ā·0.0025Ā·0.397ā0.1887. P(ā¤1)=0.3585+20(0.05)(0.95)¹ā¹ā0.3585+0.3774=0.7358. 4. P(X>k) = P(first k are failures) = (1āp)įµ. Therefore F(k)=P(Xā¤k)=1āP(X>k)=1ā(1āp)įµ. 5. P(X>5|X>3) = P(X>5)/P(X>3) = (0.8)āµ/(0.8)³ = (0.8)² = 0.64. Memoryless property: P(X>k+m|X>m)=P(X>k). Here k=2, m=3: P(X>5|X>3)=P(X>2)=(0.8)²=0.64. ā 6. X ~ Hypergeometric(N=10, K=7, n=4). Range: k=1 to 4 (can't get 0 good because only 3 bad). P(1)=C(7,1)C(3,3)/C(10,4)=7Ā·1/210=1/30; P(2)=C(7,2)C(3,2)/210=21Ā·3/210=63/210=0.3; P(3)=C(7,3)C(3,1)/210=35Ā·3/210=105/210=0.5; P(4)=C(7,4)C(3,0)/210=35Ā·1/210=1/6. Sum = 1/30+63/210+105/210+35/210 = 7/210+63/210+105/210+35/210 = 210/210 = 1. ā 7. X ~ Bin(12, 0.25). Need P(Xā„6). P(Xā„6) = Ī£_{k=6}^{12} C(12,k)(0.25)įµ(0.75)^{12āk} ā 0.0544. About 5.4%.Summary
- A discrete random variable is characterized by its PMF p_X(x) = P(X = x); the CDF F_X(x) = P(X ⤠x) is a step function with jumps at support points
- Bernoulli(p) models a single binary trial; Binomial(n, p) models the number of successes in n independent Bernoulli trials with PMF C(n,k) pįµ (1āp)^{nāk}
- Geometric(p) models the number of trials until the first success with PMF (1āp)^{kā1}p; it uniquely possesses the discrete memoryless property
- Negative Binomial(r, p) generalizes the geometric to r successes with PMF C(kā1, rā1) pʳ (1āp)^{kār}
- Hypergeometric(N, K, n) models sampling without replacement and converges to Binomial(n, K/N) as N ā ā
Pitfalls
- Confusing PMF and CDF. The PMF gives point probabilities P(X = x); the CDF accumulates P(X ⤠x). To find interval probabilities, use F(b) - F(a-), not differences of PMF values unless the interval is a single point.
- Using the binomial distribution for sampling without replacement. The binomial assumes independent draws (with replacement). For finite-population sampling without replacement, use the hypergeometric distribution. The binomial is a good approximation only when N >> n.
- Forgetting the support bounds of each distribution. Binomial ranges 0 to n, geometric (trials count) starts at 1, negative binomial (trials count) starts at r. Applying PMF formulas outside the support gives nonsense probabilities.
- Confusing the two geometric parameterizations. "Number of trials until first success" starts at 1 with PMF (1-p)^{k-1}p. "Number of failures before first success" starts at 0 with PMF (1-p)^k p. They differ by exactly 1 ā always verify which convention a problem or library uses.
- Assuming the hypergeometric ā binomial approximation always holds. The approximation is reasonable when n/N < 0.1 (the "10% rule"). For larger sampling fractions, the dependence between draws becomes significant and the hypergeometric must be used.
Quiz
-
Which property does NOT hold for every PMF? a) p(x) ℠0 for all x b) Σ p(x) = 1 c) p(x) ⤠1 for all x d) p(x) is continuous Answer: d. PMFs are defined on discrete sets and are not continuous functions.
-
If X ~ Binomial(10, 0.5), what is the shape of its PMF? a) Skewed right b) Skewed left c) Symmetric d) Uniform Answer: c. When p = 0.5, the binomial PMF is symmetric because C(n,k) = C(n, nāk).
-
The memoryless property of the geometric distribution means: a) X has no effect on future trials b) P(X > m+n | X > m) = P(X > n) c) The expected value is constant d) X is independent of n Answer: b. Given you've already waited m trials, the additional waiting time follows the same geometric distribution.
-
The range of a NegBin(r, p) random variable (counting trials, not failures) is: a) {0, 1, 2, ...} b) {1, 2, ..., r} c) {r, r+1, r+2, ...} d) {0, 1, ..., ā} Answer: c. You need at least r trials to get r successes.
-
In the hypergeometric PMF, the denominator C(N, n) represents: a) The number of ways to draw successes b) The total number of equally likely samples of size n c) The number of failures in the population d) The binomial coefficient from the limit Answer: b. It's the total number of ways to choose n items from N without replacement.
-
The CDF of a discrete RV at a point x is: a) Always equal to the PMF at x b) The sum of PMF values for all t c) The sum of PMF values for all t ⤠x d) The derivative of the PMF Answer: c. F_X(x) = Σ_{t ⤠x} p_X(t) by definition.
-
For X ~ Binomial(n, p), the mode (most likely value) is approximately: a) n/2 b) np c) ā(n+1)pā or ā(n+1)pā ā 1 d) n(1āp) Answer: c. The mode of the binomial is at or near ā(n+1)pā.
-
If you draw 2 cards without replacement from a 52-card deck, the probability both are aces is (4/52)(3/51). This uses which distribution? a) Binomial b) Geometric c) Hypergeometric d) Bernoulli Answer: c. Drawing without replacement from a finite population ā this is hypergeometric.
Next Steps
Continue to 10-05 Poisson Process and Distribution to learn about the Poisson as the limit of the binomial, the Poisson process, memorylessness, and exponential inter-arrival times.