09-03 — Singular Value Decomposition (SVD)
Phase: 9 — Matrix Decompositions & Advanced Linear Algebra Subject: 09-03 Prerequisites: 09-02 — QR Decomposition Next subject: 09-04 — Spectral Theorem and Quadratic Forms
Learning Objectives
- State and interpret the SVD $A = U Σ V^T$ and explain the roles of left singular vectors, singular values, and right singular vectors
- Relate the SVD to the eigendecomposition of
A^T AandA A^T - Distinguish between full SVD, reduced SVD, and economy SVD
- Apply the Eckart-Young theorem for low-rank approximation and data compression
- Compute the pseudoinverse via SVD and use it to solve linear systems
Core Content
The Singular Value Decomposition (SVD) is the most general and powerful matrix decomposition. Every matrix A ∈ ℝ^{m×n} (of any rank) can be factored as:
CRITICAL -- Foundational: SVD is the most important decomposition -- it exists for EVERY matrix. Reveals fundamental action: rotate, scale, rotate. Powers PCA, pseudoinverses, low-rank approximation.
$A = U Σ V^T $
where:
- U is $m × m$ orthogonal: columns are left singular vectors
- $Σ$ is $m × n$ diagonal: $σ_1 ≥ σ_2 ≥ ... ≥ σ_r > 0$ are the singular values (r = rank(A))
- V is $n × n$ orthogonal: columns are right singular vectors
1. Derivation from A^T A and A A^T
Since A^T A is $n × n$ symmetric positive semidefinite, it has orthogonal eigenvectors v_i (right singular vectors) and nonnegative eigenvalues $λ_i$:
$A^T A v_i = λ_i v_i $
Define the singular values: $σ_i = √λ_i$
For $σ_i > 0$, define the left singular vectors:
$u_i = (1/σ_i) A v_i $
Check that these are orthonormal:
u_i^T u_j = (1/σ_i σ_j) v_i^T A^T A v_j = (λ_j / σ_i σ_j) v_i^T v_j = 0 (if i ≠ j)
u_i^T u_i = (1/σ_i²) v_i^T A^T A v_i = λ_i/σ_i² = 1
For i > r (zero singular values), the v_i span the nullspace of A, and the u_i span the left nullspace (we complete bases orthogonally).
Now, for any vector x, expand in the V-basis: $x = Σ c_i v_i$. Then:
$A x = Σ c_i A v_i = Σ_{i=1}^{r} c_i σ_i u_i
$
Equivalently:
$A (Σ c_i v_i) = Σ σ_i u_i (v_i^T x) = U Σ V^T x $
Hence $A = U Σ V^T$. The SVD reveals that A acts by rotating (V^T), scaling (Σ), and rotating again (U).
Geometric interpretation: Any linear transformation A:
1. V^T: rotates/reflects input to align with principal axes of the transformation
2. $Σ$: stretches/compresses along those axes (by σ_i), possibly zeroing dimensions
3. U: rotates/reflects the result to the final orientation
The unit sphere under A becomes an ellipsoid with semi-axes $σ_i u_i$.
2. Forms of the SVD
Full SVD:
- U: m×m, $Σ$: m×n, V: n×n
- Σ has singular values on the diagonal and zeros elsewhere
Reduced (or "thin") SVD: For m ≥ n:
$A = U_1 Σ_1 V^T $
where $U_1$ is m×n (first n columns of U), $Σ_1$ is n×n diagonal, V is n×n. (For m < n, the reduced SVD takes first m columns of V.)
Economy SVD (or "compact"): Only keep rank r:
$A = U_r Σ_r V_r^T $
where U_r is m×r, $Σ_r$ is r×r, V_r is n×r. Drops dimensions where σ=0.
Rank-r approximation (truncated SVD): For any k < r:
$A_k = Σ_{i=1}^{k} σ_i u_i v_i^T
$
3. Eckart-Young Theorem
The Eckart-Young theorem states that the truncated SVD gives the best rank-k approximation to A in both the spectral norm (||·||₂) and Frobenius norm (||·||_F):
Common Pitfall: Truncated SVD of MEAN-CENTERED data gives PCA. If you forget to center, you get a different decomposition. Always center before SVD for PCA.
$min_{rank(B) ≤ k} ||A - B||₂ = σ_{k+1}
min_{rank(B) ≤ k} ||A - B||_F = sqrt(Σ_{i=k+1}^{r} σ_i²)
$
And the minimizer is $A_k = Σ_{i=1}^{k} σ_i u_i v_i^T$.
Proof sketch (Frobenius norm): $||A - B||_F² = ||U^T(A-B)V||_F² = ||Σ - U^T B V||_F²$. Since Σ is diagonal, the optimal U^T B V concentrates its rank on the k largest singular values, making the error the sum of squares of the remaining singular values.
Applications: - Data compression (e.g., image compression): store only k triplets (σ_i, u_i, v_i) instead of full matrix - Principal Component Analysis (PCA): columns of V_r are principal directions - Latent Semantic Analysis (LSA): low-rank approximation of term-document matrices - Noise reduction: truncate small singular values (assumed to be noise)
4. Pseudoinverse via SVD
The Moore-Penrose pseudoinverse generalizes matrix inverse to rectangular and/or singular matrices:
$A^+ = V Σ^+ U^T $
where $Σ^+$ is formed by reciprocating the nonzero singular values and transposing:
$Σ^+ = diag(1/σ_1, 1/σ_2, ..., 1/σ_r, 0, ..., 0) (size n×m) $
Properties:
- $A A^+ A = A$, $A^+ A A^+ = A^+$, $(A A^+)^T = A A^+$, $(A^+ A)^T = A^+ A$
- For invertible A: A^+ = A^{-1}
- $x = A^+ b$ gives the minimum-norm least-squares solution to $Ax = b$
Connection to condition number:
$κ₂(A) = σ_max / σ_min (where σ_min is the smallest nonzero singular value) $
A large condition number means A is close to singular (or rank-deficient).
5. Computing the SVD
Practical computation does NOT form A^T A explicitly (squares condition number). Instead:
- Bidiagonalization: Apply Householder reflections alternately from left and right to reduce A to bidiagonal form B
- Golub-Reinsch / Demmel-Kahan: Iteratively apply implicit QR steps to B to converge to diagonal Σ while accumulating U and V
The full algorithm is implemented in LAPACK's DGESVD and DGESDD (divide-and-conquer, faster).
Key Terms
- 09 03 Singular Value Decomposition
- 09-04 Spectral Theorem & Quadratic Forms
- Answer: a.
- Answer: b.
- Answer: c.
- End-of-Subject Quiz
- best rank-k approximation
- left singular vectors
- right singular vectors
- singular values
Worked Examples
Example 1: SVD of a 2×2 Matrix
Compute the SVD of:
$A = [3 2]
[2 3]
$
Solution:
Step 1: Compute A^T A:
$A^T A = [3 2] [3 2] = [13 12]
[2 3] [2 3] [12 13]
$
Step 2: Find eigenvalues of A^T A:
Characteristic polynomial: $det(A^T A - λI) = (13-λ)² - 144 = λ² - 26λ + 25 = (λ-1)(λ-25) = 0$
So $λ₁ = 25$, $λ₂ = 1$. Singular values: $σ₁ = 5$, $σ₂ = 1$.
Step 3: Right singular vectors (eigenvectors of A^T A):
For λ₁=25: $(A^T A - 25I)v = 0$
$[-12 12] [v₁] = [0] [ 12 -12] [v₂] [0] $
=> $v₁ = v₂$. Normalized: $v₁ = [1/√2, 1/√2]^T$
For λ₂=1: $(A^T A - I)v = 0$
$[12 12] [v₁] = [0] [12 12] [v₂] [0] $
=> $v₁ = -v₂$. Normalized: $v₂ = [1/√2, -1/√2]^T$
$V = [1/√2 1/√2]
[1/√2 -1/√2]
$
Step 4: Left singular vectors:
$u₁ = (1/σ₁) A v₁ = (1/5) [3 2; 2 3] [1/√2, 1/√2]^T = (1/5) [5/√2, 5/√2]^T = [1/√2, 1/√2]^T u₂ = (1/σ₂) A v₂ = (1/1) [3 2; 2 3] [1/√2, -1/√2]^T = [1/√2, -1/√2]^T $
$U = [1/√2 1/√2] Σ = [5 0] V = [1/√2 1/√2]
[1/√2 -1/√2] [0 1] [1/√2 -1/√2]
$
Verify: $U Σ V^T = [[3,2],[2,3]]$ ✓
Example 2: SVD of a Rectangular Matrix
$A = [1 0 1]
[0 1 1]
$
$A^T A = [[1,0],[0,1],[1,1]] [1 0 1; 0 1 1] = [[1,0,1],[0,1,1],[1,1,2]]$
Eigenvalues: $det(A^T A - λI) = ... = -λ³ + 4λ² - 3λ = -λ(λ-1)(λ-3)$ So $λ = 3, 1, 0$. $σ₁ = √3, σ₂ = 1, σ₃ = 0$. Rank = 2.
Right singular vectors (eigenvectors of A^T A): - λ=3: $v₁ = [1, 1, 2]^T/√6$ - λ=1: $v₂ = [1, -1, 0]^T/√2$ - λ=0: $v₃ = [1, 1, -1]^T/√3$ (nullspace of A)
Left singular vectors (for nonzero σ): $u₁ = A v₁/√3 = [1/√2, 1/√2]^T$ $u₂ = A v₂/1 = [1/√2, -1/√2]^T$
Full SVD:
$U = [1/√2 1/√2] Σ = [√3 0 0] V = [1/√6 1/√2 1/√3]
[1/√2 -1/√2] [ 0 1 0] [1/√6 -1/√2 1/√3]
[2/√6 0 -1/√3]
$
Example 3: Low-Rank Approximation (Eckart-Young)
Given $A = [[3,2],[2,3]]$ from Example 1 with SVD $σ₁=5, σ₂=1$:
Rank-1 approximation:
$A₁ = σ₁ u₁ v₁^T = 5 [1/√2, 1/√2]^T [1/√2, 1/√2] = 5 [[1/2, 1/2], [1/2, 1/2]] = [[2.5, 2.5], [2.5, 2.5]] $
Error: $A - A₁ = [[0.5, -0.5], [-0.5, 0.5]]$
Frobenius error: $||A - A₁||_F = σ₂ = 1$ (as predicted by Eckart-Young) Spectral error: $||A - A₁||₂ = σ₂ = 1$
Example 4: Pseudoinverse
For $A = [[3,2],[2,3],[1,1]]$ (3×2, rank 2):
SVD: $σ₁ ≈ 5.13, σ₂ ≈ 0.87$. (Computed numerically.)
$A^+ = V Σ^+ U^T $
where $Σ^+ = diag(1/5.13, 1/0.87)$ (2×3 zeros elsewhere).
For $b = [1, 0, 1]^T$, $x = A^+ b$ gives the least squares solution minimizing $||Ax - b||$.
Quiz
Q1: What does the concept of End-of-Subject Quiz primarily refer to in this subject?
A) A historical anecdote about End-of-Subject Quiz B) A computational error related to End-of-Subject Quiz C) The definition and application of End-of-Subject Quiz D) A visual representation of End-of-Subject Quiz
Correct: C)
- If you chose A: This is incorrect. End-of-Subject Quiz is defined as: the definition and application of end-of-subject quiz. The other options describe different aspects that are not the primary focus.
- If you chose B: This is incorrect. End-of-Subject Quiz is defined as: the definition and application of end-of-subject quiz. The other options describe different aspects that are not the primary focus.
- If you chose C: End-of-Subject Quiz is defined as: the definition and application of end-of-subject quiz. The other options describe different aspects that are not the primary focus. Correct!
- If you chose D: This is incorrect. End-of-Subject Quiz is defined as: the definition and application of end-of-subject quiz. The other options describe different aspects that are not the primary focus.
Q2: What is the primary purpose of Common Pitfalls?
A) It is used to common pitfalls in mathematical analysis B) It is primarily a historical notation system C) It is used only in advanced research contexts D) It replaces all other methods in this domain
Correct: A)
- If you chose A: Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
- If you chose B: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose C: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose D: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
Q3: Based on the worked examples in this subject, what is the correct result?
A) The inverse of the correct answer B) A different result from a common mistake C) An unrelated numerical value D) a.**
Correct: D)
- If you chose A: This is incorrect. The worked examples show that the result is a.**. The other options represent common errors.
- If you chose B: This is incorrect. The worked examples show that the result is a.**. The other options represent common errors.
- If you chose C: This is incorrect. The worked examples show that the result is a.**. The other options represent common errors.
- If you chose D: The worked examples show that the result is a.**. The other options represent common errors. Correct!
Practice Problems
-
Compute the full SVD of $A = [[0, 1], [1, 0]]$. How do the singular values relate to the eigenvalues?
-
For $A = [[2, 0], [0, 0], [0, 3]]$, find the SVD. What is the rank?
-
Show that if A is symmetric positive semidefinite, its SVD coincides with its eigendecomposition. (Hint: A = Q Λ Q^T = U Σ V^T.)
-
Using the SVD of $A = [[3, 2], [2, 3]]$ from Example 1, compute the pseudoinverse A^+ and verify $A A^+ A = A$.
-
For $A = [[2, 1], [1, 2]]$, compute the rank-1 approximation and the Frobenius norm of the error. Verify the Eckart-Young theorem.
-
Prove that $||A||₂ = σ_max$ (the largest singular value) for any matrix A.
Answers
1. `A^T A = I`, eigenvalues λ₁=λ₂=1, so σ₁=σ₂=1. V = I (or any orthogonal matrix). U = A V Σ^{-1} = A = [[0,1],[1,0]]. Σ = I. Note: eigenvalues of A are ±1, singular values are 1,1 — singular values are always nonnegative. 2. `A = [[2,0],[0,0],[0,3]]`. `A^T A = [[4,0],[0,9]]`. σ₁=3, σ₂=2. Right singular vectors: v₁=[0,1]^T, v₂=[1,0]^T. Left: u₁ = (1/3)[0,0,3]^T = [0,0,1]^T, u₂ = (1/2)[2,0,0]^T = [1,0,0]^T. u₃ = [0,1,0]^T completes basis. Rank = 2. 3. If A = QΛQ^T is SPD, then A^T A = A² has eigenvalues λ_i² and eigenvectors Q. So V = Q, Σ = Λ, and U = A V Σ^{-1} = QΛQ^T Q Λ^{-1} = Q. Thus U = V = Q and A = QΛQ^T. 4. σ₁=5, σ₂=1. Σ^+ = diag(1/5, 1). A^+ = V Σ^+ U^T. Since U=V here: A^+ = [[1/√2,1/√2],[1/√2,-1/√2]] diag(1/5,1) [[1/√2,1/√2],[1/√2,-1/√2]] = (1/10)[[3,-2],[-2,3]] (after computation). Verify: A A^+ A = A. 5. `A^T A = [[5,4],[4,5]]`. Eigenvalues: 9, 1. σ₁=3, σ₂=1. V = [[1/√2,1/√2],[1/√2,-1/√2]]. Rank-1: A₁ = 3 [[1/√2,1/√2]^T [1/√2,1/√2]] = 3[[1/2,1/2],[1/2,1/2]] = [[1.5,1.5],[1.5,1.5]]. Error: ||A-A₁||_F = σ₂ = 1. Eckart-Young says min ||A-B||_F over rank-1 B is σ₂=1, which matches. 6. `||A||₂ = sup_{||x||=1} ||Ax|| = sup_{||x||=1} ||U Σ V^T x|| = sup_{||y||=1} ||Σ y||` (letting y=V^T x). `||Σ y||² = Σ σ_i² y_i² ≤ σ_max² Σ y_i² = σ_max²` with equality when y = e₁. So `||A||₂ = σ_max`.Summary
- Every matrix admits an SVD $A = U Σ V^T$: singular values on Σ's diagonal reveal the "gain" in each principal direction
- Singular values are square roots of eigenvalues of
A^T A(orA A^T); they are always nonnegative and sorted descending - The Eckart-Young theorem proves truncated SVD gives the optimal low-rank approximation — foundational for PCA, compression, and denoising
- The pseudoinverse via SVD ($A^+ = V Σ^+ U^T$) solves least squares and minimum-norm problems for any matrix
- The condition number κ₂(A) = σ_max/σ_min quantifies sensitivity to perturbations
Pitfalls
-
Computing A^T A explicitly to find singular values. Forming A^T A squares the condition number and can cause loss of information for small singular values. A matrix with σ_min = 10⁻⁸ has eigenvalues of A^T A down to 10⁻¹⁶, which may underflow. Use direct SVD algorithms (Golub-Reinsch) that work on A directly.
-
Forgetting to mean-center data before SVD for PCA. SVD of a data matrix X gives principal components only when X has zero-mean columns. If you skip centering, the first singular vector is dominated by the mean, not the direction of maximum variance. Always subtract column means first.
-
Confusing left and right singular vectors. U (left) corresponds to the rows/output space (AA^T), V (right) corresponds to the columns/input space (A^T A). In PCA, the principal directions are columns of V, not U. Mixing them up gives meaningless results.
-
Thinking SVD requires a square matrix. SVD exists for EVERY matrix — rectangular, singular, rank-deficient, whatever. This universality is what makes SVD the most powerful matrix decomposition. If someone says "I need a square matrix for SVD," they're confusing it with eigendecomposition.
-
Truncating singular values too aggressively. Setting small singular values to zero is standard for denoising, but the cutoff must be chosen carefully. Truncating too much loses signal; truncating too little keeps noise. Use the singular value scree plot (σ_k vs. k) to identify the "elbow" where meaningful signal transitions to noise.
Next Steps
Continue to 09-04 Spectral Theorem & Quadratic Forms for the Rayleigh quotient, Courant-Fischer min-max principle, and classification of quadratic forms.