📐 Concept diagram

09-04 — Spectral Theorem and Quadratic Forms

Phase: 9 — Matrix Decompositions & Advanced Linear Algebra Subject: 09-04 Prerequisites: 09-03 — SVD Next subject: 09-05 — Matrix Norms and Conditioning

Learning Objectives

State and prove the Spectral Theorem for real symmetric matrices and its generalization to normal matrices
Define and apply the Rayleigh quotient to bound and characterize eigenvalues
Use the Courant-Fischer min-max principle to characterize all eigenvalues variationally
Classify quadratic forms and matrices as positive definite, semidefinite, indefinite, or negative definite
Apply Sylvester's Law of Inertia to understand invariant properties under congruence transformations

Core Content

1. The Spectral Theorem

Theorem (Real Symmetric): If A is an $n × n$ real symmetric matrix ($A = A^T$), then:

All eigenvalues of A are real
Eigenvectors corresponding to distinct eigenvalues are orthogonal
There exists an orthogonal matrix Q such that $A = Q Λ Q^T$, where $Λ = diag(λ_1, ..., λ_n)$ contains the eigenvalues

Proof outline:

Reality of eigenvalues: Let $λ$ be an eigenvalue with eigenvector v (potentially complex). Then:

$v^* A v = λ v^* v   (multiply Av = λv on left by v^*)
$

But also $v^* A v$ is real (since A is symmetric):

$(v^* A v)^* = v^T A^* v̄ = v^* A^T v = v^* A v
$

So $v^* A v$ equals its own conjugate — it's real. Since $v^* v > 0$, λ must be real.

Orthogonality: For distinct eigenvalues λ₁ ≠ λ₂ with eigenvectors v₁, v₂:

$λ₁ v₁^T v₂ = v₁^T A v₂ = v₁^T A^T v₂ = (A v₁)^T v₂ = λ₁ v₁^T v₂
$

Wait—let me redo this more carefully:

$λ₁ (v₁^T v₂) = (A v₁)^T v₂ = v₁^T A^T v₂ = v₁^T A v₂ = v₁^T (λ₂ v₂) = λ₂ (v₁^T v₂)
$

So $(λ₁ - λ₂)(v₁^T v₂) = 0$, and since λ₁ ≠ λ₂, $v₁^T v₂ = 0$.

Diagonalization by orthogonal Q: If eigenvalues are distinct, the eigenvectors are orthogonal; normalize them and Q is orthogonal. For repeated eigenvalues, we can still find an orthonormal basis of each eigenspace (Gram-Schmidt within the eigenspace).

Extension: Normal matrices. A matrix is normal if $A A^T = A^T A$. Normal matrices are unitarily diagonalizable: $A = U Λ U^*$ (U unitary). The spectral theorem for real symmetric matrices is the special case where U can be real (orthogonal Q).

CRITICAL -- Foundational: Every real symmetric matrix has REAL eigenvalues and ORTHOGONAL eigenvectors: A = Q\Lambda Q^T. This is why symmetric matrices are so tractable.

2. The Rayleigh Quotient

For a symmetric matrix A and nonzero vector x, the Rayleigh quotient is:

$R(x) = (x^T A x) / (x^T x)
$

Key properties:

$λ_min ≤ R(x) ≤ λ_max$ for all x ≠ 0
$R(v) = λ$ whenever v is an eigenvector with eigenvalue λ
The gradient: $∇R(x) = 2(Ax - R(x)x) / (x^T x)$, so stationary points of R(x) occur at eigenvectors

Variational characterization of eigenvalues:

$λ_max = max_{x ≠ 0} R(x)
λ_min = min_{x ≠ 0} R(x)
$

The maximizing x is the eigenvector for λ_max; the minimizing x is the eigenvector for λ_min.

In fact, for any subspace S of dimension k:

$λ_k = min_{dim(S) = k} max_{x ∈ S, x ≠ 0} R(x) = max_{dim(S) = n-k+1} min_{x ∈ S, x ≠ 0} R(x)
$

This is the Courant-Fischer min-max principle — arguably the most important variational principle in linear algebra.

Interpretation: To find λ_k, consider all k-dimensional subspaces. In each subspace, find the maximum of R(x) — this is at least λ_k. Then minimize over all such subspaces — the minimum is exactly λ_k. The optimal subspace is span{v_1, ..., v_k}.

3. Quadratic Forms and Definiteness

A quadratic form is a function $Q(x) = x^T A x$ where A is symmetric. Every quadratic form can be represented by a symmetric matrix (if A isn't symmetric, use $(A+A^T)/2$).

Common Pitfall: Positive definite means ALL eigenvalues > 0, not just positive entries or positive determinant. Use Sylvester criterion or check eigenvalues.

Classification:

Definiteness	Condition	Min/Max of Q(x)
Positive definite (PD)	All λ_i > 0	Q(x) > 0 for all x ≠ 0
Positive semidefinite (PSD)	All λ_i ≥ 0	Q(x) ≥ 0 for all x
Negative definite (ND)	All λ_i < 0	Q(x) < 0 for all x ≠ 0
Negative semidefinite (NSD)	All λ_i ≤ 0	Q(x) ≤ 0 for all x
Indefinite	Mixed signs	Q(x) takes both positive and negative values

Tests for positive definiteness:

Eigenvalue test: All eigenvalues > 0
Sylvester's criterion: All leading principal minors > 0
$Δ₁ = a_{11} > 0$
$Δ₂ = det([a_{11}, a_{12}; a_{21}, a_{22}]) > 0$
... up to $Δ_n = det(A) > 0$
Cholesky test: A has a Cholesky decomposition $A = L L^T$ with L_{ii} > 0
Pivot test: All pivots in Gaussian elimination (without row swaps) are positive

Example: Is $A = [[2, -1, 0], [-1, 2, -1], [0, -1, 2]]$ positive definite?

Leading principal minors: $Δ₁ = 2 > 0$, $Δ₂ = 4-1 = 3 > 0$, $Δ₃ = det(A) = 2(4-1) - (-1)(2-0) + 0 = 6 - 2 = 4 > 0$. Yes, PD.

4. Sylvester's Law of Inertia

For a real symmetric matrix A, two congruence transformations A ↦ S^T A S (S invertible) preserve the inertia — the triple $(n_+, n_-, n_0)$: - $n_+$: number of positive eigenvalues - $n_-$: number of negative eigenvalues - $n_0$: number of zero eigenvalues

This means: you can't change the signs of eigenvalues by congruence, only their magnitudes. Diagonalization by congruence (not similarity) reveals inertia but not exact eigenvalues.

Geometric meaning: The "shape" of the quadratic form (number of dimensions where it curves up, down, or is flat) is invariant under invertible linear changes of variables.

Example: $A = [[1,0],[0,-1]]$ has inertia (1,1,0). For any invertible S, S^T A S also has inertia (1,1,0) — one positive, one negative eigenvalue, zero nullity.

Key Terms

Courant-Fischer min-max principle

Worked Examples

Example 1: Rayleigh Quotient Bounds

For $A = [[3, 1], [1, 3]]$, eigenvalues are λ = 4, 2. The Rayleigh quotient for any x:

$R(x) = (3x₁² + 2x₁x₂ + 3x₂²) / (x₁² + x₂²)
$

Check bounds: For $x = e₁ = [1, 0]^T$: $R = 3$. For $x = [1, 1]^T$: $R = (3+2+3)/2 = 4$. For $x = [1, -1]^T$: $R = (3-2+3)/2 = 2$. Indeed 2 ≤ R ≤ 4.

Maximum occurs at v₁ = [1, 1]^T/√2 (eigenvector for λ=4). Minimum occurs at v₂ = [1, -1]^T/√2 (eigenvector for λ=2).

Example 2: Courant-Fischer for λ₂

For the same A, λ₂ = 2. Using Courant-Fischer:

$λ₂ = min_{dim(S)=2} max_{x∈S, x≠0} R(x)
$

The only 2D subspace in ℝ² is ℝ² itself, so max over ℝ² = 4. But:

$λ₂ = max_{dim(S)=1} min_{x∈S} R(x)
$

Consider the 1D subspace spanned by [1, -1]^T: $min = 2$. Any other 1D subspace will have a higher minimum, so the max of these minima is 2. ✓

Example 3: Classifying Quadratic Forms

Classify $Q(x, y, z) = 2x² + 2y² + 2z² - 2xy - 2yz$.

Matrix form:

$A = [[2, -1,  0],
     [-1, 2, -1],
     [ 0, -1, 2]]
$

Leading principal minors: Δ₁ = 2 > 0, Δ₂ = 3 > 0, Δ₃ = 4 > 0. So A is positive definite.

Eigenvalues (for verification): Using the known formula for tridiagonal matrices: λ_k = 2 - 2cos(kπ/(n+1)). For n=3: λ = 2±√2, 2. All positive. ✓

Example 4: Sylvester's Law of Inertia

$A = [[2, 1],
     [1, 2]]
$

Eigenvalues: λ = 3, 1. Inertia: (2, 0, 0).

Apply congruence with $S = [[1, 2], [0, 1]]$:

$S^T A S = [[1, 0], [2, 1]] [[2,1],[1,2]] [[1,2],[0,1]]
        = [[1,0],[2,1]] [[2,5],[1,4]]
        = [[2, 5], [5, 14]]
$

Eigenvalues of S^T A S: characteristic polynomial = λ² - 16λ + 3 = 0. Discriminant = 256-12 = 244 > 0, eigenvalues ≈ 15.81, 0.19. Both positive! Inertia remains (2, 0, 0). ✓

Quiz

Q1: What does the concept of Courant-Fischer min-max principle primarily refer to in this subject?

A) A computational error related to Courant-Fischer min-max principle B) A historical anecdote about Courant-Fischer min-max principle C) A visual representation of Courant-Fischer min-max principle D) The definition and application of Courant-Fischer min-max principle

Correct: D)

If you chose A: This is incorrect. Courant-Fischer min-max principle is defined as: the definition and application of courant-fischer min-max principle. The other options describe different aspects that are not the primary focus.
If you chose B: This is incorrect. Courant-Fischer min-max principle is defined as: the definition and application of courant-fischer min-max principle. The other options describe different aspects that are not the primary focus.
If you chose C: This is incorrect. Courant-Fischer min-max principle is defined as: the definition and application of courant-fischer min-max principle. The other options describe different aspects that are not the primary focus.
If you chose D: Courant-Fischer min-max principle is defined as: the definition and application of courant-fischer min-max principle. The other options describe different aspects that are not the primary focus. Correct!

Q2: What is the primary purpose of Common Pitfalls?

A) It is primarily a historical notation system B) It is used to common pitfalls in mathematical analysis C) It replaces all other methods in this domain D) It is used only in advanced research contexts

Correct: B)

If you chose A: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose B: Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
If you chose C: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose D: This is incorrect. Common Pitfalls serves the purpose described in the correct answer. The other options misrepresent its role.

Q3: Based on the worked examples in this subject, what is the correct result?

A) 3, -1. Indeed -1 ≤ 2.2 ≤ 3. B) The inverse of the correct answer C) A different result from a common mistake D) An unrelated numerical value

Correct: A)

If you chose A: The worked examples show that the result is 3, -1. Indeed -1 ≤ 2.2 ≤ 3.. The other options represent common errors. Correct!
If you chose B: This is incorrect. The worked examples show that the result is 3, -1. Indeed -1 ≤ 2.2 ≤ 3.. The other options represent common errors.
If you chose C: This is incorrect. The worked examples show that the result is 3, -1. Indeed -1 ≤ 2.2 ≤ 3.. The other options represent common errors.
If you chose D: This is incorrect. The worked examples show that the result is 3, -1. Indeed -1 ≤ 2.2 ≤ 3.. The other options represent common errors.

Practice Problems

Prove that if A is symmetric and all its eigenvalues are positive, then x^T A x > 0 for all nonzero x.
For $A = [[1, 2], [2, 1]]$, compute the Rayleigh quotient for x = [3, 1]^T and verify it lies between λ_min and λ_max.
Determine whether $A = [[1, 2, 0], [2, 4, 1], [0, 1, 5]]$ is positive definite using Sylvester's criterion.
Apply the Courant-Fischer theorem to find λ₂ of $A = [[4, 0, 0], [0, 2, 0], [0, 0, 1]]$ without computing eigenvectors.
Find the inertia of $A = [[0, 1], [1, 0]]$.
Show that the Rayleigh quotient gradient is zero exactly at eigenvectors of A.

Answers

1. Let QΛQ^T be the spectral decomposition. Then x^T A x = x^T Q Λ Q^T x = y^T Λ y = Σ λ_i y_i² where y = Q^T x. Since each λ_i > 0 and y ≠ 0 (since x ≠ 0 and Q is invertible), the sum is strictly positive. 2. `R([3,1]) = [3,1][[1,2],[2,1]][3,1]^T / (9+1) = [3,1][5,7]^T / 10 = (15+7)/10 = 2.2`. Eigenvalues: λ = 3, -1. Indeed -1 ≤ 2.2 ≤ 3. 3. Δ₁ = 1 > 0. Δ₂ = det([1,2],[2,4]) = 4-4 = 0 (NOT > 0). So A is NOT positive definite. (It's positive semidefinite? Check: 4-4=0, 4·5-1=19 > 0 but det A = ... let's check the 3rd minor: Δ₃ = 1(20-1) - 2(10-0) + 0 = 19-20 = -1 < 0. So A is indefinite.) 4. λ_max = 4 (the largest diagonal, which IS the eigenvalue for diagonal matrices). λ_min = 1. By Courant-Fischer, λ₂ is max over 2D subspaces of the min in that subspace. Choose S₂ = span{e₂, e₃}: min R(x) over S₂ = 1 (at e₃). Choose S₂ = span{e₁, e₂}: min = 2 (at e₂). Choose S₂ = span{e₁, e₃}: min = 1. So λ₂ = max(1, 2, 1) = 2. Correct: eigenvalues are 4, 2, 1. 5. Eigenvalues: λ = 1, -1. Inertia: (1, 1, 0) — one positive, one negative, no zero eigenvalues. 6. ∇R(x) = ∇(x^T A x / x^T x) = (2Ax · x^T x - x^T A x · 2x) / (x^T x)² = 2(Ax - R(x)x) / (x^T x). Setting ∇R(x) = 0 gives Ax = R(x)x, i.e., x is an eigenvector and R(x) is the corresponding eigenvalue.

Summary

The Spectral Theorem guarantees that real symmetric matrices have real eigenvalues, orthogonal eigenvectors, and diagonalization by an orthogonal matrix: $A = Q Λ Q^T$
The Rayleigh quotient $R(x) = x^T A x / x^T x$ varies continuously between λ_min and λ_max, with stationary points at eigenvectors
The Courant-Fischer min-max principle characterizes every eigenvalue variationally: $λ_k = min_{dim(S)=k} max_{x∈S} R(x)$
A quadratic form is classified by eigenvalue signs: positive definite (all λ > 0), semidefinite (all λ ≥ 0 or ≤ 0), or indefinite
Sylvester's Law of Inertia: under congruence S^T A S, the counts of positive, negative, and zero eigenvalues are invariant

Pitfalls

Checking only the final determinant for positive definiteness. Sylvester's criterion requires ALL leading principal minors to be positive — not just det(A). A diagonal matrix diag(−1, −1) has det = 1 > 0 but is negative definite. Check Δ₁, Δ₂, ..., Δ_n in sequence.
Confusing congruence with similarity. Under congruence S^T A S, eigenvalues change in magnitude (only their signs are preserved per Sylvester's Law of Inertia). Under similarity P⁻¹ A P, eigenvalues are preserved exactly. These are fundamentally different transformations with different invariants.
Applying the Rayleigh quotient to non-symmetric matrices. R(x) = x^T A x / x^T x is only meaningful when A is symmetric. For a non-symmetric A, the quadratic form depends only on (A+A^T)/2, and the Rayleigh quotient stationary points are NOT the eigenvectors of A.
Assuming normal matrices must be symmetric. Normal (A A^T = A^T A) includes symmetric, skew-symmetric, and orthogonal matrices. But normal ≠ symmetric — a rotation matrix is normal (orthogonal) but not symmetric. The spectral theorem for normal matrices gives unitary diagonalization UΛU*, which may involve complex numbers.
Misapplying Courant-Fischer: mixing up min-max and max-min. The Courant-Fischer principle gives λ_k = min_{dim(S)=k} max_{x∈S} R(x) (also max_{dim(S)=n-k+1} min_{x∈S} R(x)). A common error is to swap the order: max-min gives different values. The "min-max" form constrains from above then minimizes; "max-min" constrains from below then maximizes.

Next Steps

Continue to 09-05 Matrix Norms & Conditioning to learn how to measure matrix sensitivity through norms and condition numbers.

Progress

Phases