📐 Concept diagram

09-03 — Singular Value Decomposition (SVD)

Phase: 9 — Matrix Decompositions & Advanced Linear Algebra Subject: 09-03 Prerequisites: 09-02 — QR Decomposition Next subject: 09-04 — Spectral Theorem and Quadratic Forms

Learning Objectives

State and interpret the SVD $A = U Σ V^T$ and explain the roles of left singular vectors, singular values, and right singular vectors
Relate the SVD to the eigendecomposition of A^T A and A A^T
Distinguish between full SVD, reduced SVD, and economy SVD
Apply the Eckart-Young theorem for low-rank approximation and data compression
Compute the pseudoinverse via SVD and use it to solve linear systems

Core Content

The Singular Value Decomposition (SVD) is the most general and powerful matrix decomposition. Every matrix A ∈ ℝ^{m×n} (of any rank) can be factored as:

CRITICAL -- Foundational: SVD is the most important decomposition -- it exists for EVERY matrix. Reveals fundamental action: rotate, scale, rotate. Powers PCA, pseudoinverses, low-rank approximation.

$A = U Σ V^T
$

where: - U is $m × m$ orthogonal: columns are left singular vectors - $Σ$ is $m × n$ diagonal: $σ_1 ≥ σ_2 ≥ ... ≥ σ_r > 0$ are the singular values (r = rank(A)) - V is $n × n$ orthogonal: columns are right singular vectors

1. Derivation from A^T A and A A^T

Since A^T A is $n × n$ symmetric positive semidefinite, it has orthogonal eigenvectors v_i (right singular vectors) and nonnegative eigenvalues $λ_i$:

$A^T A v_i = λ_i v_i
$

Define the singular values: $σ_i = √λ_i$

For $σ_i > 0$, define the left singular vectors:

$u_i = (1/σ_i) A v_i
$

Check that these are orthonormal:

u_i^T u_j = (1/σ_i σ_j) v_i^T A^T A v_j = (λ_j / σ_i σ_j) v_i^T v_j = 0  (if i ≠ j)
u_i^T u_i = (1/σ_i²) v_i^T A^T A v_i = λ_i/σ_i² = 1

For i > r (zero singular values), the v_i span the nullspace of A, and the u_i span the left nullspace (we complete bases orthogonally).

Now, for any vector x, expand in the V-basis: $x = Σ c_i v_i$. Then:

$A x = Σ c_i A v_i = Σ_{i=1}^{r} c_i σ_i u_i
$

Equivalently:

$A (Σ c_i v_i) = Σ σ_i u_i (v_i^T x) = U Σ V^T x
$

Hence $A = U Σ V^T$. The SVD reveals that A acts by rotating (V^T), scaling (Σ), and rotating again (U).

Geometric interpretation: Any linear transformation A: 1. V^T: rotates/reflects input to align with principal axes of the transformation 2. $Σ$: stretches/compresses along those axes (by σ_i), possibly zeroing dimensions 3. U: rotates/reflects the result to the final orientation

The unit sphere under A becomes an ellipsoid with semi-axes $σ_i u_i$.

2. Forms of the SVD

Full SVD: - U: m×m, $Σ$: m×n, V: n×n - Σ has singular values on the diagonal and zeros elsewhere

Reduced (or "thin") SVD: For m ≥ n:

$A = U_1 Σ_1 V^T
$

where $U_1$ is m×n (first n columns of U), $Σ_1$ is n×n diagonal, V is n×n. (For m < n, the reduced SVD takes first m columns of V.)

Economy SVD (or "compact"): Only keep rank r:

$A = U_r Σ_r V_r^T
$

where U_r is m×r, $Σ_r$ is r×r, V_r is n×r. Drops dimensions where σ=0.

Rank-r approximation (truncated SVD): For any k < r:

$A_k = Σ_{i=1}^{k} σ_i u_i v_i^T
$

3. Eckart-Young Theorem

The Eckart-Young theorem states that the truncated SVD gives the best rank-k approximation to A in both the spectral norm (||·||₂) and Frobenius norm (||·||_F):

Common Pitfall: Truncated SVD of MEAN-CENTERED data gives PCA. If you forget to center, you get a different decomposition. Always center before SVD for PCA.

$min_{rank(B) ≤ k} ||A - B||₂ = σ_{k+1}
min_{rank(B) ≤ k} ||A - B||_F = sqrt(Σ_{i=k+1}^{r} σ_i²)
$

And the minimizer is $A_k = Σ_{i=1}^{k} σ_i u_i v_i^T$.

Proof sketch (Frobenius norm): $||A - B||_F² = ||U^T(A-B)V||_F² = ||Σ - U^T B V||_F²$. Since Σ is diagonal, the optimal U^T B V concentrates its rank on the k largest singular values, making the error the sum of squares of the remaining singular values.

Applications: - Data compression (e.g., image compression): store only k triplets (σ_i, u_i, v_i) instead of full matrix - Principal Component Analysis (PCA): columns of V_r are principal directions - Latent Semantic Analysis (LSA): low-rank approximation of term-document matrices - Noise reduction: truncate small singular values (assumed to be noise)

4. Pseudoinverse via SVD

The Moore-Penrose pseudoinverse generalizes matrix inverse to rectangular and/or singular matrices:

$A^+ = V Σ^+ U^T
$

where $Σ^+$ is formed by reciprocating the nonzero singular values and transposing:

$Σ^+ = diag(1/σ_1, 1/σ_2, ..., 1/σ_r, 0, ..., 0)  (size n×m)
$

Properties: - $A A^+ A = A$, $A^+ A A^+ = A^+$, $(A A^+)^T = A A^+$, $(A^+ A)^T = A^+ A$ - For invertible A: A^+ = A^{-1} - $x = A^+ b$ gives the minimum-norm least-squares solution to $Ax = b$

Connection to condition number:

$κ₂(A) = σ_max / σ_min  (where σ_min is the smallest nonzero singular value)
$

A large condition number means A is close to singular (or rank-deficient).

5. Computing the SVD

Practical computation does NOT form A^T A explicitly (squares condition number). Instead:

Bidiagonalization: Apply Householder reflections alternately from left and right to reduce A to bidiagonal form B
Golub-Reinsch / Demmel-Kahan: Iteratively apply implicit QR steps to B to converge to diagonal Σ while accumulating U and V

The full algorithm is implemented in LAPACK's DGESVD and DGESDD (divide-and-conquer, faster).

Key Terms

09 03 Singular Value Decomposition
09-04 Spectral Theorem & Quadratic Forms
Answer: a.
Answer: b.
Answer: c.
End-of-Subject Quiz
best rank-k approximation
left singular vectors
right singular vectors
singular values

Worked Examples

Example 1: SVD of a 2×2 Matrix

Compute the SVD of:

$A = [3  2]
    [2  3]
$

Solution:

Step 1: Compute A^T A:

$A^T A = [3 2] [3 2] = [13  12]
        [2 3] [2 3]   [12  13]
$

Step 2: Find eigenvalues of A^T A:

Characteristic polynomial: $det(A^T A - λI) = (13-λ)² - 144 = λ² - 26λ + 25 = (λ-1)(λ-25) = 0$

So $λ₁ = 25$, $λ₂ = 1$. Singular values: $σ₁ = 5$, $σ₂ = 1$.

Step 3: Right singular vectors (eigenvectors of A^T A):

For λ₁=25: $(A^T A - 25I)v = 0$

$[-12  12] [v₁] = [0]
[ 12 -12] [v₂]   [0]
$

=> $v₁ = v₂$. Normalized: $v₁ = [1/√2, 1/√2]^T$

For λ₂=1: $(A^T A - I)v = 0$

$[12  12] [v₁] = [0]
[12  12] [v₂]   [0]
$

=> $v₁ = -v₂$. Normalized: $v₂ = [1/√2, -1/√2]^T$

$V = [1/√2   1/√2]
    [1/√2  -1/√2]
$

Step 4: Left singular vectors:

$u₁ = (1/σ₁) A v₁ = (1/5) [3 2; 2 3] [1/√2, 1/√2]^T = (1/5) [5/√2, 5/√2]^T = [1/√2, 1/√2]^T

u₂ = (1/σ₂) A v₂ = (1/1) [3 2; 2 3] [1/√2, -1/√2]^T = [1/√2, -1/√2]^T
$

$U = [1/√2   1/√2]    Σ = [5  0]    V = [1/√2   1/√2]
    [1/√2  -1/√2]        [0  1]        [1/√2  -1/√2]
$

Verify: $U Σ V^T = [[3,2],[2,3]]$ ✓

Example 2: SVD of a Rectangular Matrix

$A = [1  0  1]
    [0  1  1]
$

$A^T A = [[1,0],[0,1],[1,1]] [1 0 1; 0 1 1] = [[1,0,1],[0,1,1],[1,1,2]]$

Eigenvalues: $det(A^T A - λI) = ... = -λ³ + 4λ² - 3λ = -λ(λ-1)(λ-3)$ So $λ = 3, 1, 0$. $σ₁ = √3, σ₂ = 1, σ₃ = 0$. Rank = 2.

Right singular vectors (eigenvectors of A^T A): - λ=3: $v₁ = [1, 1, 2]^T/√6$ - λ=1: $v₂ = [1, -1, 0]^T/√2$ - λ=0: $v₃ = [1, 1, -1]^T/√3$ (nullspace of A)

Left singular vectors (for nonzero σ): $u₁ = A v₁/√3 = [1/√2, 1/√2]^T$ $u₂ = A v₂/1 = [1/√2, -1/√2]^T$

Full SVD:

$U = [1/√2  1/√2]    Σ = [√3  0  0]    V = [1/√6  1/√2  1/√3]
    [1/√2 -1/√2]        [ 0  1  0]        [1/√6 -1/√2  1/√3]
                                          [2/√6   0   -1/√3]
$

Example 3: Low-Rank Approximation (Eckart-Young)

Given $A = [[3,2],[2,3]]$ from Example 1 with SVD $σ₁=5, σ₂=1$:

Rank-1 approximation:

$A₁ = σ₁ u₁ v₁^T = 5 [1/√2, 1/√2]^T [1/√2, 1/√2]
   = 5 [[1/2, 1/2], [1/2, 1/2]] = [[2.5, 2.5], [2.5, 2.5]]
$

Error: $A - A₁ = [[0.5, -0.5], [-0.5, 0.5]]$

Frobenius error: $||A - A₁||_F = σ₂ = 1$ (as predicted by Eckart-Young) Spectral error: $||A - A₁||₂ = σ₂ = 1$

Example 4: Pseudoinverse

For $A = [[3,2],[2,3],[1,1]]$ (3×2, rank 2):

SVD: $σ₁ ≈ 5.13, σ₂ ≈ 0.87$. (Computed numerically.)

$A^+ = V Σ^+ U^T
$

where $Σ^+ = diag(1/5.13, 1/0.87)$ (2×3 zeros elsewhere).

For $b = [1, 0, 1]^T$, $x = A^+ b$ gives the least squares solution minimizing $||Ax - b||$.

Quiz

Q1: What does the concept of End-of-Subject Quiz primarily refer to in this subject?

A) A historical anecdote about End-of-Subject Quiz B) A computational error related to End-of-Subject Quiz C) The definition and application of End-of-Subject Quiz D) A visual representation of End-of-Subject Quiz

Correct: C)

If you chose A: This is incorrect. End-of-Subject Quiz is defined as: the definition and application of end-of-subject quiz. The other options describe different aspects that are not the primary focus.
If you chose B: This is incorrect. End-of-Subject Quiz is defined as: the definition and application of end-of-subject quiz. The other options describe different aspects that are not the primary focus.
If you chose C: End-of-Subject Quiz is defined as: the definition and application of end-of-subject quiz. The other options describe different aspects that are not the primary focus. Correct!
If you chose D: This is incorrect. End-of-Subject Quiz is defined as: the definition and application of end-of-subject quiz. The other options describe different aspects that are not the primary focus.

Q2: What is the primary purpose of Common Pitfalls?

A) It is used to common pitfalls in mathematical analysis B) It is primarily a historical notation system C) It is used only in advanced research contexts D) It replaces all other methods in this domain

Correct: A)

If you chose A: Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
If you chose B: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose C: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose D: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.

Q3: Based on the worked examples in this subject, what is the correct result?

A) The inverse of the correct answer B) A different result from a common mistake C) An unrelated numerical value D) a.**

Correct: D)

If you chose A: This is incorrect. The worked examples show that the result is a.**. The other options represent common errors.
If you chose B: This is incorrect. The worked examples show that the result is a.**. The other options represent common errors.
If you chose C: This is incorrect. The worked examples show that the result is a.**. The other options represent common errors.
If you chose D: The worked examples show that the result is a.**. The other options represent common errors. Correct!

Practice Problems

Compute the full SVD of $A = [[0, 1], [1, 0]]$. How do the singular values relate to the eigenvalues?
For $A = [[2, 0], [0, 0], [0, 3]]$, find the SVD. What is the rank?
Show that if A is symmetric positive semidefinite, its SVD coincides with its eigendecomposition. (Hint: A = Q Λ Q^T = U Σ V^T.)
Using the SVD of $A = [[3, 2], [2, 3]]$ from Example 1, compute the pseudoinverse A^+ and verify $A A^+ A = A$.
For $A = [[2, 1], [1, 2]]$, compute the rank-1 approximation and the Frobenius norm of the error. Verify the Eckart-Young theorem.
Prove that $||A||₂ = σ_max$ (the largest singular value) for any matrix A.

Answers

1. `A^T A = I`, eigenvalues λ₁=λ₂=1, so σ₁=σ₂=1. V = I (or any orthogonal matrix). U = A V Σ^{-1} = A = [[0,1],[1,0]]. Σ = I. Note: eigenvalues of A are ±1, singular values are 1,1 — singular values are always nonnegative. 2. `A = [[2,0],[0,0],[0,3]]`. `A^T A = [[4,0],[0,9]]`. σ₁=3, σ₂=2. Right singular vectors: v₁=[0,1]^T, v₂=[1,0]^T. Left: u₁ = (1/3)[0,0,3]^T = [0,0,1]^T, u₂ = (1/2)[2,0,0]^T = [1,0,0]^T. u₃ = [0,1,0]^T completes basis. Rank = 2. 3. If A = QΛQ^T is SPD, then A^T A = A² has eigenvalues λ_i² and eigenvectors Q. So V = Q, Σ = Λ, and U = A V Σ^{-1} = QΛQ^T Q Λ^{-1} = Q. Thus U = V = Q and A = QΛQ^T. 4. σ₁=5, σ₂=1. Σ^+ = diag(1/5, 1). A^+ = V Σ^+ U^T. Since U=V here: A^+ = [[1/√2,1/√2],[1/√2,-1/√2]] diag(1/5,1) [[1/√2,1/√2],[1/√2,-1/√2]] = (1/10)[[3,-2],[-2,3]] (after computation). Verify: A A^+ A = A. 5. `A^T A = [[5,4],[4,5]]`. Eigenvalues: 9, 1. σ₁=3, σ₂=1. V = [[1/√2,1/√2],[1/√2,-1/√2]]. Rank-1: A₁ = 3 [[1/√2,1/√2]^T [1/√2,1/√2]] = 3[[1/2,1/2],[1/2,1/2]] = [[1.5,1.5],[1.5,1.5]]. Error: ||A-A₁||_F = σ₂ = 1. Eckart-Young says min ||A-B||_F over rank-1 B is σ₂=1, which matches. 6. `||A||₂ = sup_{||x||=1} ||Ax|| = sup_{||x||=1} ||U Σ V^T x|| = sup_{||y||=1} ||Σ y||` (letting y=V^T x). `||Σ y||² = Σ σ_i² y_i² ≤ σ_max² Σ y_i² = σ_max²` with equality when y = e₁. So `||A||₂ = σ_max`.

Summary

Every matrix admits an SVD $A = U Σ V^T$: singular values on Σ's diagonal reveal the "gain" in each principal direction
Singular values are square roots of eigenvalues of A^T A (or A A^T); they are always nonnegative and sorted descending
The Eckart-Young theorem proves truncated SVD gives the optimal low-rank approximation — foundational for PCA, compression, and denoising
The pseudoinverse via SVD ($A^+ = V Σ^+ U^T$) solves least squares and minimum-norm problems for any matrix
The condition number κ₂(A) = σ_max/σ_min quantifies sensitivity to perturbations

Pitfalls

Computing A^T A explicitly to find singular values. Forming A^T A squares the condition number and can cause loss of information for small singular values. A matrix with σ_min = 10⁻⁸ has eigenvalues of A^T A down to 10⁻¹⁶, which may underflow. Use direct SVD algorithms (Golub-Reinsch) that work on A directly.
Forgetting to mean-center data before SVD for PCA. SVD of a data matrix X gives principal components only when X has zero-mean columns. If you skip centering, the first singular vector is dominated by the mean, not the direction of maximum variance. Always subtract column means first.
Confusing left and right singular vectors. U (left) corresponds to the rows/output space (AA^T), V (right) corresponds to the columns/input space (A^T A). In PCA, the principal directions are columns of V, not U. Mixing them up gives meaningless results.
Thinking SVD requires a square matrix. SVD exists for EVERY matrix — rectangular, singular, rank-deficient, whatever. This universality is what makes SVD the most powerful matrix decomposition. If someone says "I need a square matrix for SVD," they're confusing it with eigendecomposition.
Truncating singular values too aggressively. Setting small singular values to zero is standard for denoising, but the cutoff must be chosen carefully. Truncating too much loses signal; truncating too little keeps noise. Use the singular value scree plot (σ_k vs. k) to identify the "elbow" where meaningful signal transitions to noise.

Next Steps

Continue to 09-04 Spectral Theorem & Quadratic Forms for the Rayleigh quotient, Courant-Fischer min-max principle, and classification of quadratic forms.

Progress

Phases