Math graphic
📐 Concept diagram

09-06 — Cholesky Decomposition

Phase: 9 — Matrix Decompositions & Advanced Linear Algebra Subject: 09-06 Prerequisites: 09-05 — Matrix Norms Next subject: 09-07 — Eigendecomposition Algorithms


Learning Objectives

  1. State the Cholesky decomposition $A = L L^T$ and its requirements (A must be symmetric positive definite)
  2. Derive the Cholesky algorithm row by row and explain its computational efficiency (half the operations of LU)
  3. Apply Cholesky to solve $Ax = b$ via forward/backward substitution
  4. Identify applications including Monte Carlo simulation, optimization (Newton's method), and Kalman filtering
  5. Explain the LDL^T variant and its advantages for indefinite or semidefinite matrices

Core Content

1. Definition and Existence

If A is symmetric positive definite (SPD), then there exists a unique lower-triangular matrix L with positive diagonal entries such that:

Common Pitfall: Cholesky ONLY works for SPD matrices. Attempting on indefinite matrices tries sqrt(negative) and fails. Check definiteness first, or use LDL^T for semidefinite.

$A = L L^T
$

This is the Cholesky decomposition (or Cholesky factorization).

CRITICAL -- Foundational: Cholesky A = LL^T is 2x faster than LU for SPD matrices, with no pivoting needed. SPD matrices appear everywhere: covariance, normal equations, kernel matrices.

Why SPD is required: If $A = L L^T$, then for any nonzero x, $x^T A x = x^T L L^T x = ||L^T x||² > 0$ (since L is invertible when diagonal entries are positive). So A must be positive definite. The converse also holds: every SPD matrix has a Cholesky factorization.

Relationship to LU: If $A = L L^T$, then this is also an LU decomposition with $U = L^T$. So Cholesky exists when the standard LU (without pivoting) works AND A is symmetric.

2. The Algorithm

Write $A = L L^T$ element-wise. Since L is lower-triangular:

a_{ij} = Σ_{k=1}^{min(i,j)} L_{ik} L_{jk}

For $i = j$ (diagonal):

$a_{ii} = Σ_{k=1}^{i} L_{ik}² = L_{i1}² + L_{i2}² + ... + L_{ii}²
=> L_{ii} = sqrt(a_{ii} - Σ_{k=1}^{i-1} L_{ik}²)
$

For i > j (below diagonal):

a_{ij} = Σ_{k=1}^{j} L_{ik} L_{jk}
=> L_{ij} = (a_{ij} - Σ_{k=1}^{j-1} L_{ik} L_{jk}) / L_{jj}

Algorithm (column-by-column):

for j = 1 to n:
    L_{jj} = sqrt(a_{jj} - Σ_{k=1}^{j-1} L_{jk}²)
    for i = j+1 to n:
        L_{ij} = (a_{ij} - Σ_{k=1}^{j-1} L_{ik} L_{jk}) / L_{jj}

Computational cost: ~ n³/3 flops (half the 2n³/3 of LU). Also requires only n square roots.

Row-by-row variant (useful for sparse matrices):

for i = 1 to n:
    for j = 1 to i-1:
        L_{ij} = (a_{ij} - Σ_{k=1}^{j-1} L_{ik} L_{jk}) / L_{jj}
    L_{ii} = sqrt(a_{ii} - Σ_{k=1}^{i-1} L_{ik}²)

Memory note: L can be stored in the lower triangle of A (in-place computation).

Edge case — near-singular SPD: If A is nearly singular (some eigenvalues near zero), the subtraction $a_{ii} - Σ L_{ik}²$ can yield a negative number due to round-off, causing the square root to fail. This signals that A is numerically not positive definite.

3. Solving Ax = b with Cholesky

Given $A = L L^T$:

Forward substitution: Solve $L y = b$

y_i = (b_i - Σ_{k=1}^{i-1} L_{ik} y_k) / L_{ii}

Backward substitution: Solve $L^T x = y$

x_i = (y_i - Σ_{k=i+1}^{n} L_{ki} x_k) / L_{ii}

This is identical in structure to LU but exploits symmetry for efficiency.

4. LDL^T Decomposition

For SPD matrices where we want to avoid square roots (or for symmetric indefinite matrices), we factor:

$A = L D L^T
$

where L is unit lower triangular and D is diagonal.

The diagonal entries of D are the pivots. For SPD matrices, all D_{ii} > 0, and we can recover Cholesky by setting $L_chol = L sqrt(D)$.

For symmetric indefinite matrices (some negative pivots), LDL^T still works with pivoting: $P A P^T = L D L^T$. This handles cases where Cholesky fails.

5. Applications

Monte Carlo simulation: Generate correlated random variables. If we need $X ~ N(μ, Σ)$, compute Cholesky $Σ = L L^T$. Then $X = μ + L Z$ where Z ~ N(0, I). (Since Cov(LZ) = L Cov(Z) L^T = L L^T = Σ.)

Newton's method in optimization: The Newton step requires solving $H p = -g$ where H is the Hessian. For convex problems, H is SPD, and Cholesky provides stable, efficient solves.

Kalman filtering: The covariance matrix updates remain SPD, and Cholesky factors are propagated for numerical stability ("square-root Kalman filter").

Solving normal equations: A^T A is SPD for full-column-rank A. Cholesky of A^T A is used, but beware: forming A^T A squares the condition number! QR is preferred when possible.



Key Terms

Worked Examples

Example 1: 2×2 Cholesky

Find Cholesky of:

$A = [4  2]
    [2  5]
$

First check SPD: $Δ₁ = 4 > 0$, $Δ₂ = 20-4 = 16 > 0$. ✓

$L_{11} = sqrt(a_{11}) = sqrt(4) = 2
L_{21} = a_{21} / L_{11} = 2/2 = 1
L_{22} = sqrt(a_{22} - L_{21}²) = sqrt(5 - 1) = sqrt(4) = 2
$
$L = [2  0]    Verify: L L^T = [2 0][2 1] = [4 2]
    [1  2]                    [1 2][0 2]   [2 5] ✓
$

Example 2: 3×3 Cholesky

$A = [ 4  2  2]
    [ 2  5  3]
    [ 2  3  6]
$

Leading minors: 4, 16, 36 → all positive. SPD confirmed.

Column 1:

$L_{11} = sqrt(4) = 2
L_{21} = 2/2 = 1
L_{31} = 2/2 = 1
$

Column 2:

$L_{22} = sqrt(5 - 1²) = sqrt(4) = 2
L_{32} = (3 - 1·1)/2 = 2/2 = 1
$

Column 3:

$L_{33} = sqrt(6 - 1² - 1²) = sqrt(4) = 2
$
$L = [2  0  0]
    [1  2  0]
    [1  1  2]
$

Verify:

$L L^T = [2 0 0][2 1 1]   [4 2 2]
        [1 2 0][0 2 1] = [2 5 3] ✓
        [1 1 2][0 0 2]   [2 3 6]
$

Example 3: Solving with Cholesky

Using the factorization from Example 2, solve $Ax = [8, 11, 13]^T$.

Forward: $L y = b$

$y₁ = 8/2 = 4
y₂ = (11 - 1·4)/2 = 7/2 = 3.5
y₃ = (13 - 1·4 - 1·3.5)/2 = (13 - 7.5)/2 = 5.5/2 = 2.75
$

Backward: $L^T x = y$

$x₃ = 2.75/2 = 1.375
x₂ = (3.5 - 1·1.375)/2 = 2.125/2 = 1.0625
x₁ = (4 - 1·1.0625 - 1·1.375)/2 = (4 - 2.4375)/2 = 0.78125
$

Verify: $Ax = [4(0.78125)+2(1.0625)+2(1.375), ...] = [8, 11, 13]$ ✓


Quiz

Q1: What does the concept of Cholesky decomposition primarily refer to in this subject?

A) The definition and application of Cholesky decomposition B) A computational error related to Cholesky decomposition C) A visual representation of Cholesky decomposition D) A historical anecdote about Cholesky decomposition

Correct: A)

Q2: What is the primary purpose of Common Pitfalls?

A) It replaces all other methods in this domain B) It is primarily a historical notation system C) It is used to common pitfalls in mathematical analysis D) It is used only in advanced research contexts

Correct: C)

Q3: Based on the worked examples in this subject, what is the correct result?

A) The inverse of the correct answer B) A different result from a common mistake C) An unrelated numerical value D) √10.

Correct: D)

Practice Problems

  1. Verify that $A = [[9, -3], [-3, 2]]$ is SPD and compute its Cholesky decomposition.

  2. Compute the Cholesky factorization of $A = [[16, 4, 8], [4, 5, 2], [8, 2, 14]]$.

  3. Use your factorization from problem 2 to solve $Ax = [28, 11, 24]^T$.

  4. Show that the Cholesky factor L of a diagonal SPD matrix is also diagonal, and find L for $A = diag(4, 9, 16)$.

  5. For the 2×2 SPD matrix $A = [[a, b], [b, c]]$, derive the condition for Cholesky to exist in terms of a, b, c.

  6. Explain why Cholesky fails for $A = [[1, 2], [2, 1]]$ and what decomposition would work instead.

Answers 1. Δ₁ = 9 > 0, Δ₂ = 18-9 = 9 > 0. SPD ✓. L_{11} = 3, L_{21} = -3/3 = -1, L_{22} = √(2 - 1) = 1. L = [[3, 0], [-1, 1]]. 2. Δ₁ = 16, Δ₂ = 80-16 = 64, Δ₃ = 16(70-4) - 4(56-16) + 8(8-40) = 16·66 - 4·40 + 8·(-32) = 1056 - 160 - 256 = 640. All positive. L_{11} = 4. L_{21} = 1, L_{31} = 2. L_{22} = √(5 - 1) = 2. L_{32} = (2 - 2·1)/2 = 0. L_{33} = √(14 - 4 - 0) = √10. L = [[4,0,0],[1,2,0],[2,0,√10]]. 3. Forward: y₁=28/4=7, y₂=(11-7)/2=2, y₃=(24-14-0)/√10=10/√10=√10. Backward: x₃=√10/√10=1, x₂=(2-0·1)/2=1, x₁=(7-1-2)/4=1. Solution: x = [1, 1, 1]^T. Check: Ax = [16+4+8, 4+5+2, 8+2+14]^T = [28,11,24]^T ✓. 4. For diagonal A: L_{11}=√a₁₁, L_{ii}=√a_{ii} (no off-diagonal contributions). L = diag(2, 3, 4). Indeed LL^T = diag(4, 9, 16). 5. Cholesky condition: a > 0 and c - b²/a > 0, i.e., a > 0 and ac > b². This is exactly Sylvester's criterion for 2×2 SPD. 6. For [[1,2],[2,1]]: Δ₁=1>0 but Δ₂=-3<0, so not SPD. Eigenvalues are 3, -1 (indefinite). LDL^T with pivoting would work: P A P^T = L D L^T. Or just use standard LU.

Summary


Pitfalls



Next Steps

Continue to 09-07 Eigendecomposition Algorithms to learn practical iterative methods for computing eigenvalues and eigenvectors, including power iteration and the QR algorithm.