📐 Concept diagram

09-06 — Cholesky Decomposition

Phase: 9 — Matrix Decompositions & Advanced Linear Algebra Subject: 09-06 Prerequisites: 09-05 — Matrix Norms Next subject: 09-07 — Eigendecomposition Algorithms

Learning Objectives

State the Cholesky decomposition $A = L L^T$ and its requirements (A must be symmetric positive definite)
Derive the Cholesky algorithm row by row and explain its computational efficiency (half the operations of LU)
Apply Cholesky to solve $Ax = b$ via forward/backward substitution
Identify applications including Monte Carlo simulation, optimization (Newton's method), and Kalman filtering
Explain the LDL^T variant and its advantages for indefinite or semidefinite matrices

Core Content

1. Definition and Existence

If A is symmetric positive definite (SPD), then there exists a unique lower-triangular matrix L with positive diagonal entries such that:

Common Pitfall: Cholesky ONLY works for SPD matrices. Attempting on indefinite matrices tries sqrt(negative) and fails. Check definiteness first, or use LDL^T for semidefinite.

$A = L L^T
$

This is the Cholesky decomposition (or Cholesky factorization).

CRITICAL -- Foundational: Cholesky A = LL^T is 2x faster than LU for SPD matrices, with no pivoting needed. SPD matrices appear everywhere: covariance, normal equations, kernel matrices.

Why SPD is required: If $A = L L^T$, then for any nonzero x, $x^T A x = x^T L L^T x = ||L^T x||² > 0$ (since L is invertible when diagonal entries are positive). So A must be positive definite. The converse also holds: every SPD matrix has a Cholesky factorization.

Relationship to LU: If $A = L L^T$, then this is also an LU decomposition with $U = L^T$. So Cholesky exists when the standard LU (without pivoting) works AND A is symmetric.

2. The Algorithm

Write $A = L L^T$ element-wise. Since L is lower-triangular:

a_{ij} = Σ_{k=1}^{min(i,j)} L_{ik} L_{jk}

For $i = j$ (diagonal):

$a_{ii} = Σ_{k=1}^{i} L_{ik}² = L_{i1}² + L_{i2}² + ... + L_{ii}²
=> L_{ii} = sqrt(a_{ii} - Σ_{k=1}^{i-1} L_{ik}²)
$

For i > j (below diagonal):

a_{ij} = Σ_{k=1}^{j} L_{ik} L_{jk}
=> L_{ij} = (a_{ij} - Σ_{k=1}^{j-1} L_{ik} L_{jk}) / L_{jj}

Algorithm (column-by-column):

for j = 1 to n:
    L_{jj} = sqrt(a_{jj} - Σ_{k=1}^{j-1} L_{jk}²)
    for i = j+1 to n:
        L_{ij} = (a_{ij} - Σ_{k=1}^{j-1} L_{ik} L_{jk}) / L_{jj}

Computational cost: ~ n³/3 flops (half the 2n³/3 of LU). Also requires only n square roots.

Row-by-row variant (useful for sparse matrices):

for i = 1 to n:
    for j = 1 to i-1:
        L_{ij} = (a_{ij} - Σ_{k=1}^{j-1} L_{ik} L_{jk}) / L_{jj}
    L_{ii} = sqrt(a_{ii} - Σ_{k=1}^{i-1} L_{ik}²)

Memory note: L can be stored in the lower triangle of A (in-place computation).

Edge case — near-singular SPD: If A is nearly singular (some eigenvalues near zero), the subtraction $a_{ii} - Σ L_{ik}²$ can yield a negative number due to round-off, causing the square root to fail. This signals that A is numerically not positive definite.

3. Solving Ax = b with Cholesky

Given $A = L L^T$:

Forward substitution: Solve $L y = b$

y_i = (b_i - Σ_{k=1}^{i-1} L_{ik} y_k) / L_{ii}

Backward substitution: Solve $L^T x = y$

x_i = (y_i - Σ_{k=i+1}^{n} L_{ki} x_k) / L_{ii}

This is identical in structure to LU but exploits symmetry for efficiency.

4. LDL^T Decomposition

For SPD matrices where we want to avoid square roots (or for symmetric indefinite matrices), we factor:

$A = L D L^T
$

where L is unit lower triangular and D is diagonal.

The diagonal entries of D are the pivots. For SPD matrices, all D_{ii} > 0, and we can recover Cholesky by setting $L_chol = L sqrt(D)$.

For symmetric indefinite matrices (some negative pivots), LDL^T still works with pivoting: $P A P^T = L D L^T$. This handles cases where Cholesky fails.

5. Applications

Monte Carlo simulation: Generate correlated random variables. If we need $X ~ N(μ, Σ)$, compute Cholesky $Σ = L L^T$. Then $X = μ + L Z$ where Z ~ N(0, I). (Since Cov(LZ) = L Cov(Z) L^T = L L^T = Σ.)

Newton's method in optimization: The Newton step requires solving $H p = -g$ where H is the Hessian. For convex problems, H is SPD, and Cholesky provides stable, efficient solves.

Kalman filtering: The covariance matrix updates remain SPD, and Cholesky factors are propagated for numerical stability ("square-root Kalman filter").

Solving normal equations: A^T A is SPD for full-column-rank A. Cholesky of A^T A is used, but beware: forming A^T A squares the condition number! QR is preferred when possible.

Key Terms

Cholesky decomposition

Worked Examples

Example 1: 2×2 Cholesky

Find Cholesky of:

$A = [4  2]
    [2  5]
$

First check SPD: $Δ₁ = 4 > 0$, $Δ₂ = 20-4 = 16 > 0$. ✓

$L_{11} = sqrt(a_{11}) = sqrt(4) = 2
L_{21} = a_{21} / L_{11} = 2/2 = 1
L_{22} = sqrt(a_{22} - L_{21}²) = sqrt(5 - 1) = sqrt(4) = 2
$

$L = [2  0]    Verify: L L^T = [2 0][2 1] = [4 2]
    [1  2]                    [1 2][0 2]   [2 5] ✓
$

Example 2: 3×3 Cholesky

$A = [ 4  2  2]
    [ 2  5  3]
    [ 2  3  6]
$

Leading minors: 4, 16, 36 → all positive. SPD confirmed.

Column 1:

$L_{11} = sqrt(4) = 2
L_{21} = 2/2 = 1
L_{31} = 2/2 = 1
$

Column 2:

$L_{22} = sqrt(5 - 1²) = sqrt(4) = 2
L_{32} = (3 - 1·1)/2 = 2/2 = 1
$

Column 3:

$L_{33} = sqrt(6 - 1² - 1²) = sqrt(4) = 2
$

$L = [2  0  0]
    [1  2  0]
    [1  1  2]
$

Verify:

$L L^T = [2 0 0][2 1 1]   [4 2 2]
        [1 2 0][0 2 1] = [2 5 3] ✓
        [1 1 2][0 0 2]   [2 3 6]
$

Example 3: Solving with Cholesky

Using the factorization from Example 2, solve $Ax = [8, 11, 13]^T$.

Forward: $L y = b$

$y₁ = 8/2 = 4
y₂ = (11 - 1·4)/2 = 7/2 = 3.5
y₃ = (13 - 1·4 - 1·3.5)/2 = (13 - 7.5)/2 = 5.5/2 = 2.75
$

Backward: $L^T x = y$

$x₃ = 2.75/2 = 1.375
x₂ = (3.5 - 1·1.375)/2 = 2.125/2 = 1.0625
x₁ = (4 - 1·1.0625 - 1·1.375)/2 = (4 - 2.4375)/2 = 0.78125
$

Verify: $Ax = [4(0.78125)+2(1.0625)+2(1.375), ...] = [8, 11, 13]$ ✓

Quiz

Q1: What does the concept of Cholesky decomposition primarily refer to in this subject?

A) The definition and application of Cholesky decomposition B) A computational error related to Cholesky decomposition C) A visual representation of Cholesky decomposition D) A historical anecdote about Cholesky decomposition

Correct: A)

If you chose A: Cholesky decomposition is defined as: the definition and application of cholesky decomposition. The other options describe different aspects that are not the primary focus. Correct!
If you chose B: This is incorrect. Cholesky decomposition is defined as: the definition and application of cholesky decomposition. The other options describe different aspects that are not the primary focus.
If you chose C: This is incorrect. Cholesky decomposition is defined as: the definition and application of cholesky decomposition. The other options describe different aspects that are not the primary focus.
If you chose D: This is incorrect. Cholesky decomposition is defined as: the definition and application of cholesky decomposition. The other options describe different aspects that are not the primary focus.

Q2: What is the primary purpose of Common Pitfalls?

A) It replaces all other methods in this domain B) It is primarily a historical notation system C) It is used to common pitfalls in mathematical analysis D) It is used only in advanced research contexts

Correct: C)

If you chose A: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose B: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose C: Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
If you chose D: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.

Q3: Based on the worked examples in this subject, what is the correct result?

A) The inverse of the correct answer B) A different result from a common mistake C) An unrelated numerical value D) √10.

Correct: D)

If you chose A: This is incorrect. The worked examples show that the result is √10.. The other options represent common errors.
If you chose B: This is incorrect. The worked examples show that the result is √10.. The other options represent common errors.
If you chose C: This is incorrect. The worked examples show that the result is √10.. The other options represent common errors.
If you chose D: The worked examples show that the result is √10.. The other options represent common errors. Correct!

Practice Problems

Verify that $A = [[9, -3], [-3, 2]]$ is SPD and compute its Cholesky decomposition.
Compute the Cholesky factorization of $A = [[16, 4, 8], [4, 5, 2], [8, 2, 14]]$.
Use your factorization from problem 2 to solve $Ax = [28, 11, 24]^T$.
Show that the Cholesky factor L of a diagonal SPD matrix is also diagonal, and find L for $A = diag(4, 9, 16)$.
For the 2×2 SPD matrix $A = [[a, b], [b, c]]$, derive the condition for Cholesky to exist in terms of a, b, c.
Explain why Cholesky fails for $A = [[1, 2], [2, 1]]$ and what decomposition would work instead.

Answers

1. Δ₁ = 9 > 0, Δ₂ = 18-9 = 9 > 0. SPD ✓. L_{11} = 3, L_{21} = -3/3 = -1, L_{22} = √(2 - 1) = 1. L = [[3, 0], [-1, 1]]. 2. Δ₁ = 16, Δ₂ = 80-16 = 64, Δ₃ = 16(70-4) - 4(56-16) + 8(8-40) = 16·66 - 4·40 + 8·(-32) = 1056 - 160 - 256 = 640. All positive. L_{11} = 4. L_{21} = 1, L_{31} = 2. L_{22} = √(5 - 1) = 2. L_{32} = (2 - 2·1)/2 = 0. L_{33} = √(14 - 4 - 0) = √10. L = [[4,0,0],[1,2,0],[2,0,√10]]. 3. Forward: y₁=28/4=7, y₂=(11-7)/2=2, y₃=(24-14-0)/√10=10/√10=√10. Backward: x₃=√10/√10=1, x₂=(2-0·1)/2=1, x₁=(7-1-2)/4=1. Solution: x = [1, 1, 1]^T. Check: Ax = [16+4+8, 4+5+2, 8+2+14]^T = [28,11,24]^T ✓. 4. For diagonal A: L_{11}=√a₁₁, L_{ii}=√a_{ii} (no off-diagonal contributions). L = diag(2, 3, 4). Indeed LL^T = diag(4, 9, 16). 5. Cholesky condition: a > 0 and c - b²/a > 0, i.e., a > 0 and ac > b². This is exactly Sylvester's criterion for 2×2 SPD. 6. For [[1,2],[2,1]]: Δ₁=1>0 but Δ₂=-3<0, so not SPD. Eigenvalues are 3, -1 (indefinite). LDL^T with pivoting would work: P A P^T = L D L^T. Or just use standard LU.

Summary

Cholesky factorizes SPD matrices as $A = L L^T$ where L is lower-triangular with positive diagonal — it exploits symmetry for 2× efficiency over LU
The algorithm computes L column by column: diagonal entries are square roots of pivots, off-diagonals from inner products
Cholesky exists iff A is SPD (all leading principal minors > 0, all eigenvalues > 0)
Solving $Ax = b$ via Cholesky uses forward/backward substitution just like LU, but at half the factorization cost
Applications include correlated random variable generation ($X = μ + LZ$), Newton's method, and square-root Kalman filtering
LDL^T generalizes Cholesky by avoiding square roots and handling symmetric indefinite matrices with pivoting

Pitfalls

Attempting Cholesky on non-SPD matrices. Cholesky requires symmetric positive definiteness. Attempting on indefinite matrices produces negative values under the square root and fails. Check leading principal minors or eigenvalues first, or use LDL^T with pivoting.
Forming A^T A then using Cholesky for least squares. This squares the condition number. For least squares problems, use QR decomposition (Householder) instead.
Ignoring numerical breakdown near singularity. Even when mathematically SPD, round-off can produce a negative value under the square root in $a_{ii} - Σ L_{ik}²$, causing failure. This signals near-singularity — consider regularization or LDL^T.
Confusing Cholesky with LDL^T. Cholesky uses square roots and gives L L^T with positive diagonal L. LDL^T avoids square roots by storing pivots in D and can handle symmetric indefinite matrices with pivoting.
Forgetting that Cholesky needs no pivoting. Unlike LU, Cholesky is numerically stable without row exchanges for SPD matrices. Adding unnecessary pivoting changes the factorization to $P A P^T = L L^T$.

Next Steps

Continue to 09-07 Eigendecomposition Algorithms to learn practical iterative methods for computing eigenvalues and eigenvectors, including power iteration and the QR algorithm.

Progress

Phases