Phase 11: Probability Theory II
Subject 11-09: Random Walks
Prerequisites: 10-04 (Discrete Random Variables), 10-08 (Normal Distribution), 11-06 (Limit Theorems), 11-07 (Markov Chains)
Learning Objectives
- Define the simple symmetric random walk on Z and compute hitting probabilities via the reflection principle
- Solve the gambler's ruin problem: P(ruin | start at k) for both fair and biased walks
- Apply the ballot theorem and reflection principle to count constrained path problems
- Determine recurrence vs. transience: random walk on Z is recurrent; on Zᵈ, recurrent for d=1,2; transient for d≥3
- Relate random walks to the CLT — long-term behavior approximates Brownian motion
Core Content
1. Definition and Basic Properties
A simple random walk on Z starts at S₀ = 0 and at each step moves +1 with probability p or −1 with probability q = 1−p.
$Sₙ = X₁ + X₂ + ... + Xₙ where Xᵢ = ±1, P(Xᵢ = +1) = p $
Key properties: - E[Sₙ] = n(p − q) = n(2p − 1) - Var(Sₙ) = 4npq (since Var(Xᵢ) = 4pq) - Sₙ has the Markov property (it's a spatial Markov chain on Z) - Sₙ is a martingale when p = 1/2 (the symmetric walk): E[Sₙ₊₁ | Sₙ] = Sₙ
⚠️ CRITICAL: The random walk is a sum of i.i.d. increments. By the CLT, for large n:
$Sₙ ≈ N(n(p−q), 4npq) $
(approximately normal, with a drift if p ≠ q).
2. Gambler's Ruin
A gambler starts with k dollars, plays a game where each bet wins $1 with probability p or loses $1 with probability q, and stops when reaching $0 (ruin) or $N (goal). What is P(ruin)?
Fair game (p = 1/2): Let u_k = P(ruin | start at k). The difference equation:
$u_k = (1/2) u_{k+1} + (1/2) u_{k-1}, 1 ≤ k ≤ N−1
$
with boundary conditions u₀ = 1, u_N = 0.
This is linear: u_{k+1} − 2u_k + u_{k-1} = 0. Solution: u_k = A + Bk. From boundary: u₀ = A = 1, u_N = 1 + BN = 0 → B = −1/N. So:
$u_k = 1 − k/N → P(reach N) = k/N $
Biased game (p ≠ q, p ≠ 1/2): The difference equation is u_k = p u_{k+1} + q u_{k-1}. Rewrite: p(u_{k+1} − u_k) = q(u_k − u_{k-1}) → u_{k+1} − u_k = (q/p)(u_k − u_{k-1}).
Let r = q/p. Then u_k − u_{k-1} = r^{k−1} (u₁ − u₀). Summing:
$u_k = u₀ + (u₁ − u₀)(1 + r + r² + ... + r^{k−1}) = u₀ + (u₁ − u₀)(1−r^k)/(1−r)
$
Using u₀ = 1, u_N = 0:
$P(ruin | start at k) = (r^k − r^N)/(1 − r^N) where r = q/p $
For p < q (game favors casino): r > 1. As N → ∞ (infinitely wealthy casino):
P(ultimate ruin) = 1 if p ≤ 1/2 (ruin is certain!)
P(ultimate ruin) = (q/p)^k if p > 1/2
Moral: Even with a 0.49 probability of winning each bet, ruin against an infinitely wealthy opponent is certain. With p > 0.50, the gambler has a positive probability (but not certainty) of never going broke.
3. Reflection Principle
Theorem (Reflection Principle): For the symmetric random walk (p = 1/2), the number of paths from (0, a) to (n, b) that hit or cross the x-axis equals the number of paths from (0, −a) to (n, b).
Application — Ballot Theorem: In an election, candidate A gets a votes, candidate B gets b votes (a > b). If votes are counted in random order, the probability that A always leads B throughout the count is:
$P(A always ahead) = (a − b)/(a + b) $
Derivation via reflection: Total paths from (0, 0) to (a+b, a−b): C(a+b, a). Paths where A never trails use the reflection principle on paths that ever hit −1.
Application — Maximum of a random walk: For the symmetric walk, P(max_{0≤k≤n} S_k ≥ m) = 2 P(Sₙ ≥ m) − P(Sₙ = m) = P(Sₙ ≥ m) + P(Sₙ ≥ m+1).
This follows from the reflection principle: paths that reach m are in one-to-one correspondence (via reflection) with paths that end above m, plus those that end exactly at m.
4. Recurrence and Transience
One dimension (Z): The simple symmetric random walk is recurrent — it returns to 0 infinitely often with probability 1. Proof: P(return to 0) = 1 − 1/(2π) ∫{−π}^{π} (the characteristic function method), or via the fact that Σ P(S{2n} = 0) = Σ C(2n,n)/4ⁿ ~ Σ 1/√(πn) = ∞.
Two dimensions (Z²): The simple symmetric random walk on the 2D integer lattice is also recurrent. Pólya's theorem (1921): the random walker returns home infinitely often in 1D and 2D.
Three dimensions and higher (Zᵈ, d ≥ 3): The random walk is transient — there is positive probability of never returning. In 3D, P(return) ≈ 0.3405.
Intuition: In high dimensions, there's "too much space" — the walker gets lost and may never find its way back. The transition at d=2 is a famous result (Pólya's theorem).
Asymmetric random walk (p ≠ 1/2): Always transient in any dimension (drift carries the walker away).
5. Connection to Brownian Motion
As the step size shrinks and the number of steps increases with appropriate scaling, the random walk converges to Brownian motion (Wiener process):
Let S_{⌊nt⌋} be a simple symmetric random walk observed at time ⌊nt⌋. Then:
$W^{(n)}(t) = S_{⌊nt⌋} / √n → W(t) (convergence in distribution)
$
where W(t) is standard Brownian motion: W(0) = 0, independent increments, W(t) ~ N(0, t), continuous paths.
This is Donsker's invariance principle — the "functional CLT." It says the entire path of the random walk, properly scaled, looks like Brownian motion.
Applications: - Option pricing (Black-Scholes relies on geometric Brownian motion) - Physics (diffusion, heat equation — random walk is discrete heat kernel) - Algorithm analysis (randomized algorithms, mixing times)
Key Terms
- Brownian motion
Worked Examples
Example 1: Gambler's Ruin — Casino Gambler
A gambler has $50 and plays roulette betting $1 on red each spin (p = 18/38 ≈ 0.4737, q = 20/38 ≈ 0.5263). The casino has effectively infinite money. What's the probability the gambler eventually goes broke?
Solution:
r = q/p = 20/18 = 10/9 ≈ 1.111. Since p < 1/2, r > 1.
P(ultimate ruin) = 1. Even though the game is "almost fair," ruin is certain against an infinitely wealthy opponent. The gambler's expected fortune after n bets is n(p−q) = n(18/38−20/38) = −n/19 — losing about 5.26 cents per dollar bet, on average.
Example 2: Ballot Problem
In an election, A gets 60 votes, B gets 40 votes, counted in random order. What's the probability A is always strictly ahead of B throughout the count?
Solution:
By the ballot theorem: P(A always ahead) = (a−b)/(a+b) = (60−40)/(100) = 20/100 = 1/5 = 0.20.
Despite A winning by a 60-40 landslide, the probability they lead at every stage of the count is only 20%. Most of the time, B will be tied or ahead at some point.
Example 3: Maximum of a Random Walk
A fair coin is tossed 100 times (Xᵢ = ±1). What's the approximate probability that the random walk Sₙ ever exceeds 15?
Solution:
We want P(M₁₀₀ ≥ 16) for M₁₀₀ = max_{0≤k≤100} S_k.
By reflection: P(M₁₀₀ ≥ 16) = P(S₁₀₀ ≥ 16) + P(S₁₀₀ > 16) = 2P(S₁₀₀ > 16) + P(S₁₀₀ = 16).
By CLT: S₁₀₀ ≈ N(0, 100). Z = 16/10 = 1.6. P(S₁₀₀ > 16) ≈ 1 − Φ(1.6) ≈ 0.0548. P(S₁₀₀ = 16) ≈ φ(1.6)/10 (continuity correction) ≈ 0.1109/10 ≈ 0.0111.
P(M₁₀₀ ≥ 16) ≈ 2(0.0548) + 0.0111 ≈ 0.1207.
About 12% chance of ever reaching +16 during the 100 tosses.
Quiz
Q1: A simple symmetric random walk S_n = Σ_{i=1}^n X_i with P(X_i = ±1) = 1/2 has:
A) E[S_n] = 0, Var(S_n) = n B) E[S_n] = n, Var(S_n) = 0 C) E[S_n] = 0, Var(S_n) = 1 D) E[S_n] = n/2, Var(S_n) = n/4
Correct: A)
- If you chose A: Correct! Each step has mean 0 and variance 1, so by independence: E[S_n] = n·0 = 0, Var(S_n) = n·1 = n. The standard deviation grows as √n.
- If you chose B: This would be a deterministic walk always going up.
- If you chose C: Var(S_1) = 1, but Var(S_n) = n, not 1.
- If you chose D: This would be the mean and variance for a binomial proportion, not a random walk with ±1 steps.
Q2: In the Gambler's Ruin problem starting with $i and target $N, with P(win $1) = p:
A) The probability of ruin is always 1/2 B) The probability of reaching N before 0 is (1 − (q/p)^i)/(1 − (q/p)^N) for p ≠ 1/2 C) The gambler always eventually reaches N D) The ruin probability is independent of p
Correct: B)
- If you chose B: Correct! For p ≠ 1/2 (biased coin), the ruin probability depends on the ratio q/p. For p = 1/2, it simplifies to i/N.
- If you chose A: This is only true when p = 1/2 and i = N/2.
- If you chose C: In a fair game, the probability of reaching N is i/N, which is less than 1 for i < N.
- If you chose D: The ruin probability heavily depends on p — a favorable bias dramatically reduces ruin probability.
Q5: A simple symmetric random walk in 1D and 2D is recurrent, but in 3D it is transient. This means:
A) The walk in 3D never returns to the origin B) The walk in 3D returns to the origin infinitely often with probability 1 C) The walk in 3D eventually drifts away with probability 1 D) The walk in 3D returns finitely many times with probability 1
Correct: D)
- If you chose D: Correct! In 3D, P(return to origin) < 1 (transient). In 1D/2D, P(return) = 1 and the walk returns infinitely often (recurrent). Pólya's theorem (1921).
- If you chose A: Transient means returns are finite, not that the origin is never visited.
- If you chose B: This describes recurrence (1D/2D), not transience (3D).
- If you chose C: "Drifts away" is vague — transience means returns occur only finitely many times, but the walk may still revisit the origin occasionally.
Practice Problems
- For a symmetric random walk starting at 0, find P(Sₙ = k) in terms of binomial coefficients. What is P(S_{2n} = 0)?
- Derive P(ultimate ruin) for the gambler's ruin with p > 1/2 when the opponent has infinite wealth.
- Use the reflection principle to show that for symmetric walk: P(S₁ > 0, S₂ > 0, ..., S_{2n} > 0) = C(2n, n)/((n+1)4ⁿ). (These are the Catalan numbers, normalized.)
- For an asymmetric walk with p = 0.6, approximate P(S_{100} > 0) using the CLT.
- Show that Σ_{n=1}^{∞} P(S_{2n} = 0) diverges for the symmetric walk, proving recurrence.
- A particle does a symmetric 2D random walk on Z². Write the probability of being at (x,y) after n steps. Verify recurrence by showing Σ P(return to (0,0)) diverges.
- For the gambler's ruin with p=0.4, starting with $10 against a house with $100, find P(ruin).
Answers
1. n steps: need (n+k)/2 up-steps (must have same parity). P(Sₙ=k) = C(n, (n+k)/2) (1/2)ⁿ when n+k even. P(S_{2n}=0) = C(2n,n)/4ⁿ. 2. For p>1/2, r = q/p < 1. P(ruin from k) = (r^k − r^N)/(1−r^N). As N→∞, r^N→0, so P(ruin) = r^k = (q/p)^k. 3. Catalan number argument: number of paths from (0,0) to (2n,0) staying above the x-axis is C(2n,n)/(n+1). Divided by total paths 4ⁿ gives probability. (This is the probability a random walk stays non-negative for 2n steps and returns to 0.) 4. S_{100} = sum of 100 steps. E[S] = 100(0.6−0.4) = 20. Var = 4·100·0.6·0.4 = 96. σ ≈ 9.80. P(S>0) ≈ 1 − Φ((0−20)/9.80) = 1 − Φ(−2.04) = Φ(2.04) ≈ 0.979. 5. P(S_{2n}=0) = C(2n,n)/4ⁿ ~ 1/√(πn) by Stirling. Σ 1/√n diverges, so Σ P(S_{2n}=0) = ∞, proving recurrence. 6. 2D: each step is (±1,0) or (0,±1) with prob 1/4. After 2n steps with x+y even: P(return) = C(2n,n)²/4^{2n} ~ 1/(πn). Σ 1/n diverges logarithmically → recurrent. (In 3D, P(return) ∼ 1/n^{3/2} and Σ converges → transient.) 7. r = 0.6/0.4 = 1.5. N = 110. P(ruin from 10) = (1.5¹⁰ − 1.5¹¹⁰)/(1 − 1.5¹¹⁰). 1.5¹⁰ ≈ 57.67, 1.5¹¹⁰ is astronomically large. P(ruin) ≈ 1.5¹⁰/1.5¹¹⁰ ≈ 1/1.5¹⁰⁰ ≈ 0. Essentially certain ruin: ≈ 1 − 1.5^{−100} ≈ 1.Summary
- Simple random walk: Sₙ = Σ Xᵢ with Xᵢ = ±1; E[Sₙ] = n(p−q), Var(Sₙ) = 4npq, CLT gives Sₙ ≈ N(n(p−q), 4npq)
- Gambler's ruin: P(ruin from k) = (r^k − r^N)/(1−r^N) where r = q/p; with p ≤ 1/2 and N → ∞, ruin is certain
- Reflection principle maps paths hitting a barrier to reflected paths; gives ballot theorem P(A always ahead) = (a−b)/(a+b)
- Pólya's theorem: symmetric random walk is recurrent in 1D and 2D, transient in 3D+ (too much space)
- Random walk scaled by 1/√n converges to Brownian motion — the functional CLT (Donsker's invariance principle)
Pitfalls
- Using the wrong gambler's ruin formula for p = 1/2: The fair-game formula P(reach N) = k/N is a special case. For p ≠ 1/2, you must use P(reach N) = (1 − r^k)/(1 − r^N) with r = q/p. Plugging p=1/2 into the biased formula gives 0/0 — a common algebraic trap.
- Applying the reflection principle to asymmetric walks: The reflection principle (and the ballot theorem) only hold for the symmetric walk with p = 1/2. For biased walks, paths are not equally likely, so the combinatorial bijection argument fails.
- Assuming recurrence in all dimensions: The symmetric random walk is recurrent in 1D and 2D but transient in 3D and higher. A common error is to extrapolate from 1D intuition and assume the walker always returns — in 3D, P(return) ≈ 0.34.
- Using the CLT approximation without checking sample size: For p ≠ 1/2, the random walk has drift, and the CLT approximation Sₙ ≈ N(n(p−q), 4npq) requires n large enough for normality to kick in. For small n or extreme p, the approximation can be poor.
- Confusing "transient" with "never returns": A transient random walk in 3D may still return to the origin several times — it just doesn't return infinitely often. Transience means the total number of returns is finite with probability 1.
Quiz
-
A fair random walk has p = 1/2. After 100 steps, the expected position is: a) 100 b) 50 c) 0 d) 10 Answer: c. E[S₁₀₀] = 100(0.5−0.5) = 0. The walk is a martingale.
-
In the gambler's ruin with p = 0.5, starting with $k and goal $N, P(reach goal) =: a) k/N b) (N−k)/N c) 1/2 d) (1/2)^k Answer: a. For the fair game, probability of reaching N before 0 is proportional to starting wealth.
-
The reflection principle is used to: a) Compute expected values b) Count paths that touch or cross a barrier c) Find stationary distributions d) Calculate variances Answer: b. Reflection creates a bijection between paths hitting a barrier and paths starting from a reflected origin.
-
A simple symmetric random walk on Z² is: a) Transient b) Recurrent c) Explosive d) Absorbing Answer: b. Pólya proved recurrence for d = 1 and d = 2, transience for d ≥ 3.
-
The ballot theorem says: if A gets a votes and B gets b votes (a > b), P(A always ahead) =: a) a/(a+b) b) b/(a+b) c) (a−b)/(a+b) d) 1/2 Answer: c. The margin of victory relative to total votes cast determines the probability of leading throughout.
-
For biased walk with p = 0.6, the drift per step is: a) 0.6 b) 0.2 c) 1.0 d) 0.4 Answer: b. E[Xᵢ] = p−q = 0.6−0.4 = 0.2 per step.
-
The limiting behavior of a properly scaled random walk is: a) A Poisson process b) Brownian motion c) A stationary Markov chain d) An explosive process Answer: b. Donsker's theorem: S_{⌊nt⌋}/√n → W(t), standard Brownian motion.
-
In a 3D symmetric random walk, the probability of ever returning to the origin is: a) 1 b) 0 c) About 0.34 d) 1/2 Answer: c. In 3D, P(return) ≈ 0.3405 — transient but with a substantial chance of at least one return.
Next Steps
Continue to 11-10 Information Theory Connection for the mathematical bridge between probability and information: entropy, KL divergence, mutual information, and their probabilistic foundations.