09-07 — Eigendecomposition Algorithms
Phase: 9 — Matrix Decompositions & Advanced Linear Algebra Subject: 09-07 Prerequisites: 09-06 — Cholesky Next subject: 09-08 — Tensor Algebra Introduction
Learning Objectives
- Implement and analyze the power iteration method for computing the dominant eigenvalue and eigenvector
- Apply inverse iteration to find any eigenvalue given a close shift, including Rayleigh quotient iteration for cubic convergence
- Explain the QR algorithm as the workhorse for computing all eigenvalues, including practical shifts and the Hessenberg form
- Compare convergence rates: linear (power iteration), quadratic (inverse iteration), cubic (Rayleigh quotient iteration)
- Understand deflation techniques to compute subsequent eigenpairs after finding the dominant one
Core Content
1. Power Iteration
The power method finds the eigenvalue of largest magnitude (dominant eigenvalue) and its eigenvector.
Algorithm: Given a random starting vector v^{(0)}:
for k = 1, 2, 3, ...:
w = A v^{(k-1)}
v^{(k)} = w / ||w||
λ^{(k)} = (v^{(k)})^T A v^{(k)} (Rayleigh quotient estimate)
Why it works: Expand v^{(0)} in the eigenbasis: $v^{(0)} = Σ c_i q_i$ where $A q_i = λ_i q_i$. Assume $|λ₁| > |λ₂| ≥ ... ≥ |λ_n|$ and $c₁ ≠ 0$. Then:
$A^k v^{(0)} = Σ c_i λ_i^k q_i = λ₁^k [c₁ q₁ + Σ_{i=2}^{n} c_i (λ_i/λ₁)^k q_i]
$
Since $|λ_i/λ₁| < 1$ for $i ≥ 2$, the terms $(λ_i/λ₁)^k → 0$ as $k → ∞$. So A^k v^{(0)} converges to a multiple of q₁.
Convergence rate: Linear, with ratio $|λ₂/λ₁|$. Each iteration multiplies the error by approximately $|λ₂/λ₁|$.
Limitations: - Only finds one eigenpair (the dominant one) - Converges slowly if $|λ₂/λ₁|$ is close to 1 - Fails if $c₁ = 0$ (starting vector orthogonal to q₁) — but round-off error usually introduces a tiny component - Doesn't work well if there are multiple eigenvalues of equal magnitude (e.g., complex conjugate pairs)
Normalization: Any norm works; ℓ₂ (Euclidean) is standard, but ℓ_∞ (divide by max component) avoids square roots.
2. Inverse Iteration
To find the eigenvalue closest to a shift $σ$ and its eigenvector:
for k = 1, 2, 3, ...:
Solve (A - σI) w = v^{(k-1)}
v^{(k)} = w / ||w||
λ^{(k)} = σ + (v^{(k)})^T (A - σI) v^{(k)}
Why it works: The eigenvalues of (A - σI)^{-1} are $1/(λ_i - σ)$. The eigenvalue closest to σ has the largest magnitude in (A - σI)^{-1}, so power iteration on (A - σI)^{-1} converges to that eigenvector.
Common Pitfall: Power iteration finds only the dominant eigenvalue. For smallest, use inverse iteration. For interior, use shift-invert. Know which method for which eigenvalue.
Convergence: Linear with ratio $|λ_closest - σ| / |λ_second_closest - σ|$. If σ is very close to an eigenvalue, convergence is extremely fast.
Practical note: Don't invert $(A - σI)$ explicitly! Instead, factor $A - σI = LU$ once, then each iteration solves via forward/backward substitution.
3. Rayleigh Quotient Iteration (RQI)
RQI is inverse iteration where the shift is updated to the current Rayleigh quotient estimate:
σ_k = (v^{(k)})^T A v^{(k)} / ((v^{(k)})^T v^{(k)}) = (v^{(k)})^T A v^{(k)}
(since v is normalized).
Algorithm:
v^{(0)} = random unit vector
for k = 1, 2, 3, ...:
σ = v^T A v
Solve (A - σI) w = v
v = w / ||w||
Convergence: Cubic for symmetric matrices! Each iteration triples the number of correct digits (once close enough). The cost: factoring $A - σI$ each iteration (σ changes each step).
4. The QR Algorithm
The QR algorithm computes ALL eigenvalues of a matrix. It's the foundation of modern eigenvalue computation (LAPACK, MATLAB, NumPy).
CRITICAL -- Foundational: The QR algorithm is the standard method for all eigenvalues. Iterates A_k = Q_k R_k, A_{k+1} = R_k Q_k, converging to Schur form. Used by numpy.linalg.eig and MATLAB eig.
Basic (unshifted) QR algorithm:
A_0 = A
for k = 0, 1, 2, ...:
Compute QR factorization: A_k = Q_k R_k
A_{k+1} = R_k Q_k
Since $A_{k+1} = R_k Q_k = Q_k^T A_k Q_k$, each iterate is orthogonally similar to the original — same eigenvalues.
Convergence: Under mild conditions (distinct eigenvalue magnitudes), A_k converges to an upper-triangular (Schur) form. The diagonal entries converge to the eigenvalues in descending order of magnitude.
Convergence is linear, with rate $|λ_{i+1}/λ_i|$ for the (i,i) entry.
Practical improvements:
1. Reduction to Hessenberg form: Before the QR iterations, reduce A to upper Hessenberg form (zero below the first subdiagonal) via orthogonal similarity transforms (Householder). This preserves the Hessenberg structure under QR iterations, reducing the cost of each QR step from O(n³) to O(n²).
2. Shifts (Wilkinson shift): $A_k - μ_k I = Q_k R_k$, then $A_{k+1} = R_k Q_k + μ_k I$. With the Wilkinson shift (eigenvalue of the bottom-right 2×2 block closest to A_{nn}), convergence is quadratic (cubic for symmetric).
3. Deflation: When a subdiagonal entry becomes negligible (~ machine epsilon), split the problem:
$A = [A_{11} A_{12}]
[ 0 A_{22}]
$
Eigenvalues of A = eigenvalues of A_{11} ∪ eigenvalues of A_{22}. Continue QR on smaller blocks.
4. Implicit shifts (Francis QR step): Apply the shift implicitly using bulge-chasing — avoids explicitly subtracting the shift (numerically more stable).
The full practical QR algorithm (symmetric case):
1. Tridiagonalize: A → T (Householder, O(n³))
2. QR iteration on T with Wilkinson shifts and deflation (O(n²) per eigenvalue found)
3. Accumulate eigenvectors if needed
5. Comparison of Methods
| Method | What it finds | Convergence | Cost per iter | Notes |
|---|---|---|---|---|
| Power iteration | Dominant eigenpair | Linear ( | λ₂/λ₁ | ) |
| Inverse iteration | Eigenpair near σ | Linear (faster near σ) | O(n²) + 1 factor | Pick σ close to desired λ |
| RQI | Nearest eigenpair | Cubic (symmetric) | O(n³) per iter | Need good starting guess |
| QR algorithm | All eigenvalues | Linear→quadratic (shifted) | O(n²) Hessenberg | Production standard |
| Jacobi (symmetric) | All eigenpairs | Quadratic | O(n³) | Simple, highly accurate |
Key Terms
- Convergence is linear
- Cubic
Worked Examples
Example 1: Power Iteration on a Simple Matrix
$A = [2 1]
[1 2]
$
Eigenvalues: λ₁ = 3 (dominant), λ₂ = 1. Ratio = 1/3 ≈ 0.333.
Start: $v^{(0)} = [1, 0]^T$
Iteration 1: $w = A v^{(0)} = [2, 1]^T$. $||w|| = √5 ≈ 2.236$. $v^{(1)} = [0.8944, 0.4472]^T$. $λ ≈ (v^{(1)})^T A v^{(1)} = 2.60$.
Iteration 2: $w = A v^{(1)} = [2.2360, 1.7889]^T$. $||w|| = 2.863$. $v^{(2)} = [0.7809, 0.6247]^T$. $λ ≈ 2.886$.
Iteration 3: $w = A v^{(2)} = [2.1865, 2.0303]^T$. $v^{(3)} = [0.7327, 0.6805]^T$. $λ ≈ 2.964$.
Iteration 4: $w = [2.1459, 2.0938]^T$. $v^{(4)} = [0.7156, 0.6985]^T$. $λ ≈ 2.988$.
True v₁ = [0.7071, 0.7071]^T. After 4 iterations: error ≈ 0.012. Convergence factor ~0.33 per iteration.
Example 2: Inverse Iteration with Shift
Same A. To find λ₂ = 1, use shift σ = 0.5 (closer to 1 than to 3).
$A - σI = [1.5 1.0]
[1.0 1.5]
$
Factor once: $L = [[1,0],[2/3,1]]$, $U = [[1.5,1],[0,5/6]]$.
Start $v^{(0)} = [1, 0]^T$.
Iteration 1: Solve (A - σI) w = v^{(0)}.
Forward: $L y = [1,0]^T$ → $y = [1, -2/3]^T$.
Backward: $U w = y$ → $w₂ = (-2/3)/(5/6) = -4/5 = -0.8$, $w₁ = (1 - 1·(-0.8))/1.5 = 1.8/1.5 = 1.2$.
$||w|| = √(1.44+0.64) = √2.08 ≈ 1.442$. $v^{(1)} = [0.832, -0.555]^T$.
True eigenvector for λ₂=1: $[0.7071, -0.7071]^T$. One iteration gets decently close.
Eigenvalue estimate: $λ ≈ 0.5 + (v^{(1)})^T (A - σI) v^{(1)} = 0.5 + 0.5 = 1.0$. Already exact!
Example 3: Rayleigh Quotient Iteration
Same A. Start with $v^{(0)} = [1, 0.5]^T/√1.25 = [0.8944, 0.4472]^T$.
σ₀ = v^T A v = [0.8944, 0.4472] [2,1; 1,2] [0.8944, 0.4472]^T = [0.8944, 0.4472] [2.236, 1.789]^T = 2.0 + 0.8 = 2.8.
Solve $(A - 2.8I) w = v$: $[-0.8, 1; 1, -0.8] w = [0.8944, 0.4472]^T$. → $w = [-1.906, -2.942]^T$. Normalize: $v = [-0.544, -0.839]^T$.
σ₁ = v^T A v ≈ 2.987. Already very close to λ₁=3. Next iteration gives ~machine precision.
Quiz
Q1: What does the concept of Cubic primarily refer to in this subject?
A) A computational error related to Cubic B) The definition and application of Cubic C) A visual representation of Cubic D) A historical anecdote about Cubic
Correct: B)
- If you chose A: This is incorrect. Cubic is defined as: the definition and application of cubic. The other options describe different aspects that are not the primary focus.
- If you chose B: Cubic is defined as: the definition and application of cubic. The other options describe different aspects that are not the primary focus. Correct!
- If you chose C: This is incorrect. Cubic is defined as: the definition and application of cubic. The other options describe different aspects that are not the primary focus.
- If you chose D: This is incorrect. Cubic is defined as: the definition and application of cubic. The other options describe different aspects that are not the primary focus.
Q2: What is the primary purpose of Convergence is linear?
A) It replaces all other methods in this domain B) It is primarily a historical notation system C) It is used only in advanced research contexts D) It is used to convergence is linear in mathematical analysis
Correct: D)
- If you chose A: This is incorrect. Convergence is linear serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose B: This is incorrect. Convergence is linear serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose C: This is incorrect. Convergence is linear serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose D: Convergence is linear serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
Q3: Which statement about Common Pitfalls is TRUE?
A) Common Pitfalls is not related to this subject B) Common Pitfalls is an advanced topic beyond this subject's scope C) Common Pitfalls is mentioned only as a historical footnote D) Common Pitfalls is a fundamental concept covered in this subject
Correct: D)
- If you chose A: This is incorrect. Common Pitfalls is a fundamental concept covered in this subject. This subject covers Common Pitfalls as part of its core content.
- If you chose B: This is incorrect. Common Pitfalls is a fundamental concept covered in this subject. This subject covers Common Pitfalls as part of its core content.
- If you chose C: This is incorrect. Common Pitfalls is a fundamental concept covered in this subject. This subject covers Common Pitfalls as part of its core content.
- If you chose D: Common Pitfalls is a fundamental concept covered in this subject. This subject covers Common Pitfalls as part of its core content. Correct!
Q4: Based on the worked examples in this subject, what is the correct result?
A) [0.970,0.243]^T, λ≈4.059. B) A different result from a common mistake C) An unrelated numerical value D) The inverse of the correct answer
Correct: A)
- If you chose A: The worked examples show that the result is [0.970,0.243]^T, λ≈4.059.. The other options represent common errors. Correct!
- If you chose B: This is incorrect. The worked examples show that the result is [0.970,0.243]^T, λ≈4.059.. The other options represent common errors.
- If you chose C: This is incorrect. The worked examples show that the result is [0.970,0.243]^T, λ≈4.059.. The other options represent common errors.
- If you chose D: This is incorrect. The worked examples show that the result is [0.970,0.243]^T, λ≈4.059.. The other options represent common errors.
Practice Problems
-
Apply 3 iterations of the power method to $A = [[4, 1], [1, 3]]$ starting with $v^{(0)} = [1, 0]^T$. Estimate the dominant eigenvalue after each iteration.
-
Use inverse iteration with shift σ = 2 on the matrix from problem 1 to find the eigenvalue near 2. (The eigenvalues are (7+√5)/2 ≈ 4.618 and (7-√5)/2 ≈ 2.382.)
-
For a 2×2 symmetric matrix with eigenvalues λ₁ = 10, λ₂ = 1, how many power iterations are needed to reduce the eigenvector error by a factor of 10⁻⁶?
-
Explain why the QR algorithm without shifts converges at the rate |λ_{i+1}/λ_i| for the i-th diagonal entry.
-
Show that if A is symmetric, Rayleigh quotient iteration converges cubically (give the key idea — that the Rayleigh quotient is a stationary point at eigenvectors).
Answers
1. Iter 1: w = [4,1]^T, ||w||=√17≈4.123, v=[0.970,0.243]^T, λ≈4.059. Iter 2: w=[4.124,1.699]^T, ||w||≈4.460, v=[0.925,0.381]^T, λ≈4.397. Iter 3: w=[4.081,2.068]^T, ||w||≈4.575, v=[0.892,0.452]^T, λ≈4.520. True λ₁≈4.618. Error decreases by ≈|λ₂/λ₁|≈2.382/4.618≈0.516 per iteration. 2. A - 2I = [[2,1],[1,1]]. Factor: L=[[1,0],[1/2,1]], U=[[2,1],[0,1/2]]. Iter 1: v=[1,0]^T. Solve Ly=[1,0]^T→y=[1,-1/2]^T. Uw=y→w=[3/4,-1]^T. ||w||=5/4. v=[0.6,-0.8]^T. λ≈2+v^T(A-2I)v. v^T(A-2I)=[0.6,-0.8][[2,1],[1,1]]=[0.4,-0.2]. v^T(A-2I)v = 0.4·0.6 + (-0.2)(-0.8) = 0.24+0.16=0.4. λ≈2.4. True λ₂≈2.382. Good after one iteration. 3. Convergence factor = λ₂/λ₁ = 0.1. Need (0.1)^k < 10⁻⁶. k > 6/log₁₀(1/0.1) = 6/1 = 6. So about 6 iterations. 4. In the QR algorithm, A_k converges to upper-triangular form. For the (i,i) entry, the rate of convergence to λ_i is |λ_{i+1}/λ_i| (assuming |λ₁| > |λ₂| > ... > |λ_n|). This comes from analyzing the QR iteration as subspace iteration: the first i columns of the accumulated Q span approximately the invariant subspace of the i largest eigenvalues, with error decaying like |λ_{i+1}/λ_i|^k. 5. Let f(v) = R(v) = v^T A v / v^T v. At an eigenvector v*, ∇R(v*) = 0 and the Hessian is singular along the tangent space (eigenvalue is stationary). The cubic convergence of RQI comes from the fact that for symmetric A, the error after one step behaves as ||e_{k+1}|| = O(||e_k||³) because the Newton iteration for finding the eigenvector, when the Rayleigh quotient is used as the shift, has a zero second-derivative term (a form of "superlinear" convergence sharpened to cubic by symmetry).Summary
- Power iteration finds the dominant eigenpair: $v_{k+1} = A v_k / ||A v_k||$, converging linearly at rate |λ₂/λ₁|
- Inverse iteration finds the eigenpair closest to a shift σ: solve $(A - σI)w = v_k$, converging linearly with amplification factor proportional to 1/|λ_nearest - σ|
- Rayleigh quotient iteration updates the shift to v^T A v each step, achieving cubic convergence for symmetric matrices
- The QR algorithm (with Hessenberg reduction and Wilkinson shifts) is the production method for computing all eigenvalues: O(n³) tridiagonalization + O(n²) per eigenvalue found
- Deflation techniques peel off converged eigenvalues, enabling all eigenpairs to be found sequentially
Pitfalls
- Using power iteration when you need all eigenvalues. Power iteration finds only the dominant eigenpair. For all eigenvalues, use the QR algorithm (the production standard, used by
numpy.linalg.eig). - Expecting fast convergence when eigenvalues are nearly equal. Power iteration converges at rate |λ₂/λ₁| — if |λ₂| ≈ |λ₁|, convergence stalls. Consider shift-invert strategies to amplify the gap.
- Forgetting to reduce to Hessenberg form before QR. Without Hessenberg reduction, each QR step costs O(n³) instead of O(n²). Always tridiagonalize (symmetric) or Hessenberg-reduce (general) first.
- Assuming Rayleigh quotient iteration guarantees cubic convergence from any start. RQI converges cubically only when already close to an eigenvector. A poor initial guess can converge to the wrong eigenpair or diverge.
- Explicitly inverting (A - σI) in inverse iteration. Factor $A - σI = LU$ once, then solve via forward/backward substitution in each iteration. Explicit inversion is O(n³) per iteration; solving with factors is O(n²).
Next Steps
Continue to 09-08 Tensor Algebra Introduction to extend linear algebra concepts beyond matrices to higher-order tensors.