09-06 — Cholesky Decomposition
Phase: 9 — Matrix Decompositions & Advanced Linear Algebra Subject: 09-06 Prerequisites: 09-05 — Matrix Norms Next subject: 09-07 — Eigendecomposition Algorithms
Learning Objectives
- State the Cholesky decomposition $A = L L^T$ and its requirements (A must be symmetric positive definite)
- Derive the Cholesky algorithm row by row and explain its computational efficiency (half the operations of LU)
- Apply Cholesky to solve $Ax = b$ via forward/backward substitution
- Identify applications including Monte Carlo simulation, optimization (Newton's method), and Kalman filtering
- Explain the LDL^T variant and its advantages for indefinite or semidefinite matrices
Core Content
1. Definition and Existence
If A is symmetric positive definite (SPD), then there exists a unique lower-triangular matrix L with positive diagonal entries such that:
Common Pitfall: Cholesky ONLY works for SPD matrices. Attempting on indefinite matrices tries sqrt(negative) and fails. Check definiteness first, or use LDL^T for semidefinite.
$A = L L^T $
This is the Cholesky decomposition (or Cholesky factorization).
CRITICAL -- Foundational: Cholesky A = LL^T is 2x faster than LU for SPD matrices, with no pivoting needed. SPD matrices appear everywhere: covariance, normal equations, kernel matrices.
Why SPD is required: If $A = L L^T$, then for any nonzero x, $x^T A x = x^T L L^T x = ||L^T x||² > 0$ (since L is invertible when diagonal entries are positive). So A must be positive definite. The converse also holds: every SPD matrix has a Cholesky factorization.
Relationship to LU: If $A = L L^T$, then this is also an LU decomposition with $U = L^T$. So Cholesky exists when the standard LU (without pivoting) works AND A is symmetric.
2. The Algorithm
Write $A = L L^T$ element-wise. Since L is lower-triangular:
a_{ij} = Σ_{k=1}^{min(i,j)} L_{ik} L_{jk}
For $i = j$ (diagonal):
$a_{ii} = Σ_{k=1}^{i} L_{ik}² = L_{i1}² + L_{i2}² + ... + L_{ii}²
=> L_{ii} = sqrt(a_{ii} - Σ_{k=1}^{i-1} L_{ik}²)
$
For i > j (below diagonal):
a_{ij} = Σ_{k=1}^{j} L_{ik} L_{jk}
=> L_{ij} = (a_{ij} - Σ_{k=1}^{j-1} L_{ik} L_{jk}) / L_{jj}
Algorithm (column-by-column):
for j = 1 to n:
L_{jj} = sqrt(a_{jj} - Σ_{k=1}^{j-1} L_{jk}²)
for i = j+1 to n:
L_{ij} = (a_{ij} - Σ_{k=1}^{j-1} L_{ik} L_{jk}) / L_{jj}
Computational cost: ~ n³/3 flops (half the 2n³/3 of LU). Also requires only n square roots.
Row-by-row variant (useful for sparse matrices):
for i = 1 to n:
for j = 1 to i-1:
L_{ij} = (a_{ij} - Σ_{k=1}^{j-1} L_{ik} L_{jk}) / L_{jj}
L_{ii} = sqrt(a_{ii} - Σ_{k=1}^{i-1} L_{ik}²)
Memory note: L can be stored in the lower triangle of A (in-place computation).
Edge case — near-singular SPD: If A is nearly singular (some eigenvalues near zero), the subtraction $a_{ii} - Σ L_{ik}²$ can yield a negative number due to round-off, causing the square root to fail. This signals that A is numerically not positive definite.
3. Solving Ax = b with Cholesky
Given $A = L L^T$:
Forward substitution: Solve $L y = b$
y_i = (b_i - Σ_{k=1}^{i-1} L_{ik} y_k) / L_{ii}
Backward substitution: Solve $L^T x = y$
x_i = (y_i - Σ_{k=i+1}^{n} L_{ki} x_k) / L_{ii}
This is identical in structure to LU but exploits symmetry for efficiency.
4. LDL^T Decomposition
For SPD matrices where we want to avoid square roots (or for symmetric indefinite matrices), we factor:
$A = L D L^T $
where L is unit lower triangular and D is diagonal.
The diagonal entries of D are the pivots. For SPD matrices, all D_{ii} > 0, and we can recover Cholesky by setting $L_chol = L sqrt(D)$.
For symmetric indefinite matrices (some negative pivots), LDL^T still works with pivoting: $P A P^T = L D L^T$. This handles cases where Cholesky fails.
5. Applications
Monte Carlo simulation: Generate correlated random variables. If we need $X ~ N(μ, Σ)$, compute Cholesky $Σ = L L^T$. Then $X = μ + L Z$ where Z ~ N(0, I). (Since Cov(LZ) = L Cov(Z) L^T = L L^T = Σ.)
Newton's method in optimization: The Newton step requires solving $H p = -g$ where H is the Hessian. For convex problems, H is SPD, and Cholesky provides stable, efficient solves.
Kalman filtering: The covariance matrix updates remain SPD, and Cholesky factors are propagated for numerical stability ("square-root Kalman filter").
Solving normal equations: A^T A is SPD for full-column-rank A. Cholesky of A^T A is used, but beware: forming A^T A squares the condition number! QR is preferred when possible.
Key Terms
- Cholesky decomposition
Worked Examples
Example 1: 2×2 Cholesky
Find Cholesky of:
$A = [4 2]
[2 5]
$
First check SPD: $Δ₁ = 4 > 0$, $Δ₂ = 20-4 = 16 > 0$. ✓
$L_{11} = sqrt(a_{11}) = sqrt(4) = 2
L_{21} = a_{21} / L_{11} = 2/2 = 1
L_{22} = sqrt(a_{22} - L_{21}²) = sqrt(5 - 1) = sqrt(4) = 2
$
$L = [2 0] Verify: L L^T = [2 0][2 1] = [4 2]
[1 2] [1 2][0 2] [2 5] ✓
$
Example 2: 3×3 Cholesky
$A = [ 4 2 2]
[ 2 5 3]
[ 2 3 6]
$
Leading minors: 4, 16, 36 → all positive. SPD confirmed.
Column 1:
$L_{11} = sqrt(4) = 2
L_{21} = 2/2 = 1
L_{31} = 2/2 = 1
$
Column 2:
$L_{22} = sqrt(5 - 1²) = sqrt(4) = 2
L_{32} = (3 - 1·1)/2 = 2/2 = 1
$
Column 3:
$L_{33} = sqrt(6 - 1² - 1²) = sqrt(4) = 2
$
$L = [2 0 0]
[1 2 0]
[1 1 2]
$
Verify:
$L L^T = [2 0 0][2 1 1] [4 2 2]
[1 2 0][0 2 1] = [2 5 3] ✓
[1 1 2][0 0 2] [2 3 6]
$
Example 3: Solving with Cholesky
Using the factorization from Example 2, solve $Ax = [8, 11, 13]^T$.
Forward: $L y = b$
$y₁ = 8/2 = 4 y₂ = (11 - 1·4)/2 = 7/2 = 3.5 y₃ = (13 - 1·4 - 1·3.5)/2 = (13 - 7.5)/2 = 5.5/2 = 2.75 $
Backward: $L^T x = y$
$x₃ = 2.75/2 = 1.375 x₂ = (3.5 - 1·1.375)/2 = 2.125/2 = 1.0625 x₁ = (4 - 1·1.0625 - 1·1.375)/2 = (4 - 2.4375)/2 = 0.78125 $
Verify: $Ax = [4(0.78125)+2(1.0625)+2(1.375), ...] = [8, 11, 13]$ ✓
Quiz
Q1: What does the concept of Cholesky decomposition primarily refer to in this subject?
A) The definition and application of Cholesky decomposition B) A computational error related to Cholesky decomposition C) A visual representation of Cholesky decomposition D) A historical anecdote about Cholesky decomposition
Correct: A)
- If you chose A: Cholesky decomposition is defined as: the definition and application of cholesky decomposition. The other options describe different aspects that are not the primary focus. Correct!
- If you chose B: This is incorrect. Cholesky decomposition is defined as: the definition and application of cholesky decomposition. The other options describe different aspects that are not the primary focus.
- If you chose C: This is incorrect. Cholesky decomposition is defined as: the definition and application of cholesky decomposition. The other options describe different aspects that are not the primary focus.
- If you chose D: This is incorrect. Cholesky decomposition is defined as: the definition and application of cholesky decomposition. The other options describe different aspects that are not the primary focus.
Q2: What is the primary purpose of Common Pitfalls?
A) It replaces all other methods in this domain B) It is primarily a historical notation system C) It is used to common pitfalls in mathematical analysis D) It is used only in advanced research contexts
Correct: C)
- If you chose A: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose B: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose C: Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
- If you chose D: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
Q3: Based on the worked examples in this subject, what is the correct result?
A) The inverse of the correct answer B) A different result from a common mistake C) An unrelated numerical value D) √10.
Correct: D)
- If you chose A: This is incorrect. The worked examples show that the result is √10.. The other options represent common errors.
- If you chose B: This is incorrect. The worked examples show that the result is √10.. The other options represent common errors.
- If you chose C: This is incorrect. The worked examples show that the result is √10.. The other options represent common errors.
- If you chose D: The worked examples show that the result is √10.. The other options represent common errors. Correct!
Practice Problems
-
Verify that $A = [[9, -3], [-3, 2]]$ is SPD and compute its Cholesky decomposition.
-
Compute the Cholesky factorization of $A = [[16, 4, 8], [4, 5, 2], [8, 2, 14]]$.
-
Use your factorization from problem 2 to solve $Ax = [28, 11, 24]^T$.
-
Show that the Cholesky factor L of a diagonal SPD matrix is also diagonal, and find L for $A = diag(4, 9, 16)$.
-
For the 2×2 SPD matrix $A = [[a, b], [b, c]]$, derive the condition for Cholesky to exist in terms of a, b, c.
-
Explain why Cholesky fails for $A = [[1, 2], [2, 1]]$ and what decomposition would work instead.
Answers
1. Δ₁ = 9 > 0, Δ₂ = 18-9 = 9 > 0. SPD ✓. L_{11} = 3, L_{21} = -3/3 = -1, L_{22} = √(2 - 1) = 1. L = [[3, 0], [-1, 1]]. 2. Δ₁ = 16, Δ₂ = 80-16 = 64, Δ₃ = 16(70-4) - 4(56-16) + 8(8-40) = 16·66 - 4·40 + 8·(-32) = 1056 - 160 - 256 = 640. All positive. L_{11} = 4. L_{21} = 1, L_{31} = 2. L_{22} = √(5 - 1) = 2. L_{32} = (2 - 2·1)/2 = 0. L_{33} = √(14 - 4 - 0) = √10. L = [[4,0,0],[1,2,0],[2,0,√10]]. 3. Forward: y₁=28/4=7, y₂=(11-7)/2=2, y₃=(24-14-0)/√10=10/√10=√10. Backward: x₃=√10/√10=1, x₂=(2-0·1)/2=1, x₁=(7-1-2)/4=1. Solution: x = [1, 1, 1]^T. Check: Ax = [16+4+8, 4+5+2, 8+2+14]^T = [28,11,24]^T ✓. 4. For diagonal A: L_{11}=√a₁₁, L_{ii}=√a_{ii} (no off-diagonal contributions). L = diag(2, 3, 4). Indeed LL^T = diag(4, 9, 16). 5. Cholesky condition: a > 0 and c - b²/a > 0, i.e., a > 0 and ac > b². This is exactly Sylvester's criterion for 2×2 SPD. 6. For [[1,2],[2,1]]: Δ₁=1>0 but Δ₂=-3<0, so not SPD. Eigenvalues are 3, -1 (indefinite). LDL^T with pivoting would work: P A P^T = L D L^T. Or just use standard LU.Summary
- Cholesky factorizes SPD matrices as $A = L L^T$ where L is lower-triangular with positive diagonal — it exploits symmetry for 2× efficiency over LU
- The algorithm computes L column by column: diagonal entries are square roots of pivots, off-diagonals from inner products
- Cholesky exists iff A is SPD (all leading principal minors > 0, all eigenvalues > 0)
- Solving $Ax = b$ via Cholesky uses forward/backward substitution just like LU, but at half the factorization cost
- Applications include correlated random variable generation ($X = μ + LZ$), Newton's method, and square-root Kalman filtering
- LDL^T generalizes Cholesky by avoiding square roots and handling symmetric indefinite matrices with pivoting
Pitfalls
- Attempting Cholesky on non-SPD matrices. Cholesky requires symmetric positive definiteness. Attempting on indefinite matrices produces negative values under the square root and fails. Check leading principal minors or eigenvalues first, or use LDL^T with pivoting.
- Forming A^T A then using Cholesky for least squares. This squares the condition number. For least squares problems, use QR decomposition (Householder) instead.
- Ignoring numerical breakdown near singularity. Even when mathematically SPD, round-off can produce a negative value under the square root in $a_{ii} - Σ L_{ik}²$, causing failure. This signals near-singularity — consider regularization or LDL^T.
- Confusing Cholesky with LDL^T. Cholesky uses square roots and gives
L L^Twith positive diagonal L. LDL^T avoids square roots by storing pivots in D and can handle symmetric indefinite matrices with pivoting. - Forgetting that Cholesky needs no pivoting. Unlike LU, Cholesky is numerically stable without row exchanges for SPD matrices. Adding unnecessary pivoting changes the factorization to $P A P^T = L L^T$.
Next Steps
Continue to 09-07 Eigendecomposition Algorithms to learn practical iterative methods for computing eigenvalues and eigenvectors, including power iteration and the QR algorithm.