09-04 — Spectral Theorem and Quadratic Forms
Phase: 9 — Matrix Decompositions & Advanced Linear Algebra Subject: 09-04 Prerequisites: 09-03 — SVD Next subject: 09-05 — Matrix Norms and Conditioning
Learning Objectives
- State and prove the Spectral Theorem for real symmetric matrices and its generalization to normal matrices
- Define and apply the Rayleigh quotient to bound and characterize eigenvalues
- Use the Courant-Fischer min-max principle to characterize all eigenvalues variationally
- Classify quadratic forms and matrices as positive definite, semidefinite, indefinite, or negative definite
- Apply Sylvester's Law of Inertia to understand invariant properties under congruence transformations
Core Content
1. The Spectral Theorem
Theorem (Real Symmetric): If A is an $n × n$ real symmetric matrix ($A = A^T$), then:
- All eigenvalues of
Aare real - Eigenvectors corresponding to distinct eigenvalues are orthogonal
- There exists an orthogonal matrix
Qsuch that $A = Q Λ Q^T$, where $Λ = diag(λ_1, ..., λ_n)$ contains the eigenvalues
Proof outline:
Reality of eigenvalues: Let $λ$ be an eigenvalue with eigenvector v (potentially complex). Then:
$v^* A v = λ v^* v (multiply Av = λv on left by v^*) $
But also $v^* A v$ is real (since A is symmetric):
$(v^* A v)^* = v^T A^* v̄ = v^* A^T v = v^* A v $
So $v^* A v$ equals its own conjugate — it's real. Since $v^* v > 0$, λ must be real.
Orthogonality: For distinct eigenvalues λ₁ ≠ λ₂ with eigenvectors v₁, v₂:
$λ₁ v₁^T v₂ = v₁^T A v₂ = v₁^T A^T v₂ = (A v₁)^T v₂ = λ₁ v₁^T v₂ $
Wait—let me redo this more carefully:
$λ₁ (v₁^T v₂) = (A v₁)^T v₂ = v₁^T A^T v₂ = v₁^T A v₂ = v₁^T (λ₂ v₂) = λ₂ (v₁^T v₂) $
So $(λ₁ - λ₂)(v₁^T v₂) = 0$, and since λ₁ ≠ λ₂, $v₁^T v₂ = 0$.
Diagonalization by orthogonal Q: If eigenvalues are distinct, the eigenvectors are orthogonal; normalize them and Q is orthogonal. For repeated eigenvalues, we can still find an orthonormal basis of each eigenspace (Gram-Schmidt within the eigenspace).
Extension: Normal matrices. A matrix is normal if $A A^T = A^T A$. Normal matrices are unitarily diagonalizable: $A = U Λ U^*$ (U unitary). The spectral theorem for real symmetric matrices is the special case where U can be real (orthogonal Q).
CRITICAL -- Foundational: Every real symmetric matrix has REAL eigenvalues and ORTHOGONAL eigenvectors: A = Q\Lambda Q^T. This is why symmetric matrices are so tractable.
2. The Rayleigh Quotient
For a symmetric matrix A and nonzero vector x, the Rayleigh quotient is:
$R(x) = (x^T A x) / (x^T x) $
Key properties:
- $λ_min ≤ R(x) ≤ λ_max$ for all x ≠ 0
- $R(v) = λ$ whenever v is an eigenvector with eigenvalue λ
- The gradient: $∇R(x) = 2(Ax - R(x)x) / (x^T x)$, so stationary points of R(x) occur at eigenvectors
Variational characterization of eigenvalues:
$λ_max = max_{x ≠ 0} R(x)
λ_min = min_{x ≠ 0} R(x)
$
The maximizing x is the eigenvector for λ_max; the minimizing x is the eigenvector for λ_min.
In fact, for any subspace S of dimension k:
$λ_k = min_{dim(S) = k} max_{x ∈ S, x ≠ 0} R(x) = max_{dim(S) = n-k+1} min_{x ∈ S, x ≠ 0} R(x)
$
This is the Courant-Fischer min-max principle — arguably the most important variational principle in linear algebra.
Interpretation: To find λ_k, consider all k-dimensional subspaces. In each subspace, find the maximum of R(x) — this is at least λ_k. Then minimize over all such subspaces — the minimum is exactly λ_k. The optimal subspace is span{v_1, ..., v_k}.
3. Quadratic Forms and Definiteness
A quadratic form is a function $Q(x) = x^T A x$ where A is symmetric. Every quadratic form can be represented by a symmetric matrix (if A isn't symmetric, use $(A+A^T)/2$).
Common Pitfall: Positive definite means ALL eigenvalues > 0, not just positive entries or positive determinant. Use Sylvester criterion or check eigenvalues.
Classification:
| Definiteness | Condition | Min/Max of Q(x) |
|---|---|---|
| Positive definite (PD) | All λ_i > 0 | Q(x) > 0 for all x ≠ 0 |
| Positive semidefinite (PSD) | All λ_i ≥ 0 | Q(x) ≥ 0 for all x |
| Negative definite (ND) | All λ_i < 0 | Q(x) < 0 for all x ≠ 0 |
| Negative semidefinite (NSD) | All λ_i ≤ 0 | Q(x) ≤ 0 for all x |
| Indefinite | Mixed signs | Q(x) takes both positive and negative values |
Tests for positive definiteness:
- Eigenvalue test: All eigenvalues > 0
- Sylvester's criterion: All leading principal minors > 0
- $Δ₁ = a_{11} > 0$
- $Δ₂ = det([a_{11}, a_{12}; a_{21}, a_{22}]) > 0$
- ... up to $Δ_n = det(A) > 0$
- Cholesky test: A has a Cholesky decomposition $A = L L^T$ with
L_{ii} > 0 - Pivot test: All pivots in Gaussian elimination (without row swaps) are positive
Example: Is $A = [[2, -1, 0], [-1, 2, -1], [0, -1, 2]]$ positive definite?
Leading principal minors: $Δ₁ = 2 > 0$, $Δ₂ = 4-1 = 3 > 0$, $Δ₃ = det(A) = 2(4-1) - (-1)(2-0) + 0 = 6 - 2 = 4 > 0$. Yes, PD.
4. Sylvester's Law of Inertia
For a real symmetric matrix A, two congruence transformations A ↦ S^T A S (S invertible) preserve the inertia — the triple $(n_+, n_-, n_0)$:
- $n_+$: number of positive eigenvalues
- $n_-$: number of negative eigenvalues
- $n_0$: number of zero eigenvalues
This means: you can't change the signs of eigenvalues by congruence, only their magnitudes. Diagonalization by congruence (not similarity) reveals inertia but not exact eigenvalues.
Geometric meaning: The "shape" of the quadratic form (number of dimensions where it curves up, down, or is flat) is invariant under invertible linear changes of variables.
Example: $A = [[1,0],[0,-1]]$ has inertia (1,1,0). For any invertible S, S^T A S also has inertia (1,1,0) — one positive, one negative eigenvalue, zero nullity.
Key Terms
- Courant-Fischer min-max principle
Worked Examples
Example 1: Rayleigh Quotient Bounds
For $A = [[3, 1], [1, 3]]$, eigenvalues are λ = 4, 2. The Rayleigh quotient for any x:
$R(x) = (3x₁² + 2x₁x₂ + 3x₂²) / (x₁² + x₂²) $
Check bounds: For $x = e₁ = [1, 0]^T$: $R = 3$. For $x = [1, 1]^T$: $R = (3+2+3)/2 = 4$. For $x = [1, -1]^T$: $R = (3-2+3)/2 = 2$. Indeed 2 ≤ R ≤ 4.
Maximum occurs at v₁ = [1, 1]^T/√2 (eigenvector for λ=4). Minimum occurs at v₂ = [1, -1]^T/√2 (eigenvector for λ=2).
Example 2: Courant-Fischer for λ₂
For the same A, λ₂ = 2. Using Courant-Fischer:
$λ₂ = min_{dim(S)=2} max_{x∈S, x≠0} R(x)
$
The only 2D subspace in ℝ² is ℝ² itself, so max over ℝ² = 4. But:
$λ₂ = max_{dim(S)=1} min_{x∈S} R(x)
$
Consider the 1D subspace spanned by [1, -1]^T: $min = 2$. Any other 1D subspace will have a higher minimum, so the max of these minima is 2. ✓
Example 3: Classifying Quadratic Forms
Classify $Q(x, y, z) = 2x² + 2y² + 2z² - 2xy - 2yz$.
Matrix form:
$A = [[2, -1, 0],
[-1, 2, -1],
[ 0, -1, 2]]
$
Leading principal minors: Δ₁ = 2 > 0, Δ₂ = 3 > 0, Δ₃ = 4 > 0. So A is positive definite.
Eigenvalues (for verification): Using the known formula for tridiagonal matrices: λ_k = 2 - 2cos(kπ/(n+1)). For n=3: λ = 2±√2, 2. All positive. ✓
Example 4: Sylvester's Law of Inertia
$A = [[2, 1],
[1, 2]]
$
Eigenvalues: λ = 3, 1. Inertia: (2, 0, 0).
Apply congruence with $S = [[1, 2], [0, 1]]$:
$S^T A S = [[1, 0], [2, 1]] [[2,1],[1,2]] [[1,2],[0,1]]
= [[1,0],[2,1]] [[2,5],[1,4]]
= [[2, 5], [5, 14]]
$
Eigenvalues of S^T A S: characteristic polynomial = λ² - 16λ + 3 = 0. Discriminant = 256-12 = 244 > 0, eigenvalues ≈ 15.81, 0.19. Both positive! Inertia remains (2, 0, 0). ✓
Quiz
Q1: What does the concept of Courant-Fischer min-max principle primarily refer to in this subject?
A) A computational error related to Courant-Fischer min-max principle B) A historical anecdote about Courant-Fischer min-max principle C) A visual representation of Courant-Fischer min-max principle D) The definition and application of Courant-Fischer min-max principle
Correct: D)
- If you chose A: This is incorrect. Courant-Fischer min-max principle is defined as: the definition and application of courant-fischer min-max principle. The other options describe different aspects that are not the primary focus.
- If you chose B: This is incorrect. Courant-Fischer min-max principle is defined as: the definition and application of courant-fischer min-max principle. The other options describe different aspects that are not the primary focus.
- If you chose C: This is incorrect. Courant-Fischer min-max principle is defined as: the definition and application of courant-fischer min-max principle. The other options describe different aspects that are not the primary focus.
- If you chose D: Courant-Fischer min-max principle is defined as: the definition and application of courant-fischer min-max principle. The other options describe different aspects that are not the primary focus. Correct!
Q2: What is the primary purpose of Common Pitfalls?
A) It is primarily a historical notation system B) It is used to common pitfalls in mathematical analysis C) It replaces all other methods in this domain D) It is used only in advanced research contexts
Correct: B)
- If you chose A: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose B: Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
- If you chose C: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose D: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
Q3: Based on the worked examples in this subject, what is the correct result?
A) 3, -1. Indeed -1 ≤ 2.2 ≤ 3. B) The inverse of the correct answer C) A different result from a common mistake D) An unrelated numerical value
Correct: A)
- If you chose A: The worked examples show that the result is 3, -1. Indeed -1 ≤ 2.2 ≤ 3.. The other options represent common errors. Correct!
- If you chose B: This is incorrect. The worked examples show that the result is 3, -1. Indeed -1 ≤ 2.2 ≤ 3.. The other options represent common errors.
- If you chose C: This is incorrect. The worked examples show that the result is 3, -1. Indeed -1 ≤ 2.2 ≤ 3.. The other options represent common errors.
- If you chose D: This is incorrect. The worked examples show that the result is 3, -1. Indeed -1 ≤ 2.2 ≤ 3.. The other options represent common errors.
Practice Problems
-
Prove that if A is symmetric and all its eigenvalues are positive, then
x^T A x > 0for all nonzero x. -
For $A = [[1, 2], [2, 1]]$, compute the Rayleigh quotient for x = [3, 1]^T and verify it lies between λ_min and λ_max.
-
Determine whether $A = [[1, 2, 0], [2, 4, 1], [0, 1, 5]]$ is positive definite using Sylvester's criterion.
-
Apply the Courant-Fischer theorem to find λ₂ of $A = [[4, 0, 0], [0, 2, 0], [0, 0, 1]]$ without computing eigenvectors.
-
Find the inertia of $A = [[0, 1], [1, 0]]$.
-
Show that the Rayleigh quotient gradient is zero exactly at eigenvectors of A.
Answers
1. Let QΛQ^T be the spectral decomposition. Then x^T A x = x^T Q Λ Q^T x = y^T Λ y = Σ λ_i y_i² where y = Q^T x. Since each λ_i > 0 and y ≠ 0 (since x ≠ 0 and Q is invertible), the sum is strictly positive. 2. `R([3,1]) = [3,1][[1,2],[2,1]][3,1]^T / (9+1) = [3,1][5,7]^T / 10 = (15+7)/10 = 2.2`. Eigenvalues: λ = 3, -1. Indeed -1 ≤ 2.2 ≤ 3. 3. Δ₁ = 1 > 0. Δ₂ = det([1,2],[2,4]) = 4-4 = 0 (NOT > 0). So A is NOT positive definite. (It's positive semidefinite? Check: 4-4=0, 4·5-1=19 > 0 but det A = ... let's check the 3rd minor: Δ₃ = 1(20-1) - 2(10-0) + 0 = 19-20 = -1 < 0. So A is indefinite.) 4. λ_max = 4 (the largest diagonal, which IS the eigenvalue for diagonal matrices). λ_min = 1. By Courant-Fischer, λ₂ is max over 2D subspaces of the min in that subspace. Choose S₂ = span{e₂, e₃}: min R(x) over S₂ = 1 (at e₃). Choose S₂ = span{e₁, e₂}: min = 2 (at e₂). Choose S₂ = span{e₁, e₃}: min = 1. So λ₂ = max(1, 2, 1) = 2. Correct: eigenvalues are 4, 2, 1. 5. Eigenvalues: λ = 1, -1. Inertia: (1, 1, 0) — one positive, one negative, no zero eigenvalues. 6. ∇R(x) = ∇(x^T A x / x^T x) = (2Ax · x^T x - x^T A x · 2x) / (x^T x)² = 2(Ax - R(x)x) / (x^T x). Setting ∇R(x) = 0 gives Ax = R(x)x, i.e., x is an eigenvector and R(x) is the corresponding eigenvalue.Summary
- The Spectral Theorem guarantees that real symmetric matrices have real eigenvalues, orthogonal eigenvectors, and diagonalization by an orthogonal matrix: $A = Q Λ Q^T$
- The Rayleigh quotient $R(x) = x^T A x / x^T x$ varies continuously between λ_min and λ_max, with stationary points at eigenvectors
- The Courant-Fischer min-max principle characterizes every eigenvalue variationally: $λ_k = min_{dim(S)=k} max_{x∈S} R(x)$
- A quadratic form is classified by eigenvalue signs: positive definite (all λ > 0), semidefinite (all λ ≥ 0 or ≤ 0), or indefinite
- Sylvester's Law of Inertia: under congruence
S^T A S, the counts of positive, negative, and zero eigenvalues are invariant
Pitfalls
-
Checking only the final determinant for positive definiteness. Sylvester's criterion requires ALL leading principal minors to be positive — not just det(A). A diagonal matrix diag(−1, −1) has det = 1 > 0 but is negative definite. Check Δ₁, Δ₂, ..., Δ_n in sequence.
-
Confusing congruence with similarity. Under congruence S^T A S, eigenvalues change in magnitude (only their signs are preserved per Sylvester's Law of Inertia). Under similarity P⁻¹ A P, eigenvalues are preserved exactly. These are fundamentally different transformations with different invariants.
-
Applying the Rayleigh quotient to non-symmetric matrices. R(x) = x^T A x / x^T x is only meaningful when A is symmetric. For a non-symmetric A, the quadratic form depends only on (A+A^T)/2, and the Rayleigh quotient stationary points are NOT the eigenvectors of A.
-
Assuming normal matrices must be symmetric. Normal (A A^T = A^T A) includes symmetric, skew-symmetric, and orthogonal matrices. But normal ≠ symmetric — a rotation matrix is normal (orthogonal) but not symmetric. The spectral theorem for normal matrices gives unitary diagonalization UΛU*, which may involve complex numbers.
-
Misapplying Courant-Fischer: mixing up min-max and max-min. The Courant-Fischer principle gives λ_k = min_{dim(S)=k} max_{x∈S} R(x) (also max_{dim(S)=n-k+1} min_{x∈S} R(x)). A common error is to swap the order: max-min gives different values. The "min-max" form constrains from above then minimizes; "max-min" constrains from below then maximizes.
Next Steps
Continue to 09-05 Matrix Norms & Conditioning to learn how to measure matrix sensitivity through norms and condition numbers.