📐 Concept diagram

09-05 — Matrix Norms and Conditioning

Phase: 9 — Matrix Decompositions & Advanced Linear Algebra Subject: 09-05 Prerequisites: 09-04 — Spectral Theorem Next subject: 09-06 — Cholesky Decomposition

Learning Objectives

Define and compute vector norms: ℓ₁, ℓ₂ (Euclidean), ℓ_∞
Define induced matrix norms and compute them: ||A||₁, ||A||₂, ||A||_∞
Explain the condition number κ(A) and its role in quantifying sensitivity of linear systems
Analyze ill-conditioned systems and understand the forward/backward error relationship
Distinguish between numerical stability of algorithms and conditioning of problems

Core Content

1. Vector Norms

A vector norm ||·|| satisfies: positivity (||x|| ≥ 0, = 0 iff x = 0), homogeneity (||αx|| = |α| ||x||), and triangle inequality (||x + y|| ≤ ||x|| + ||y||).

For $x = [x₁, ..., x_n]^T$:

ℓ₁-norm (Manhattan/taxicab):

$||x||₁ = Σ_{i=1}^{n} |x_i|
$

ℓ₂-norm (Euclidean):

$||x||₂ = sqrt(Σ_{i=1}^{n} x_i²) = sqrt(x^T x)
$

ℓ_∞-norm (max/infinity):

$||x||_∞ = max_{1 ≤ i ≤ n} |x_i|
$

ℓ_p-norm (general):

||x||_p = (Σ |x_i|^p)^{1/p}  for p ≥ 1

As p → ∞, ℓ_p → ℓ_∞.

Norm equivalence in finite dimensions: For any two norms ||·||_a and ||·||_b on ℝ^n, there exist constants c, C > 0 such that $c||x||_a ≤ ||x||_b ≤ C||x||_a$. Specifically: - $||x||_∞ ≤ ||x||₂ ≤ √n ||x||_∞$ - $||x||_₂ ≤ ||x||₁ ≤ √n ||x||_₂$ - $||x||_∞ ≤ ||x||₁ ≤ n ||x||_∞$

2. Induced Matrix Norms

A matrix norm ||·|| is induced (or "operator norm") by a vector norm if:

$||A|| = sup_{x ≠ 0} ||Ax|| / ||x|| = sup_{||x||=1} ||Ax||
$

The induced norm measures the maximum "stretch" a matrix can apply to any unit vector.

Key induced matrix norms:

Common Pitfall: Different norms give different condition numbers. kappa_2 uses spectral norm; kappa_1 and kappa_inf can be much larger. Always specify which you are using.

||A||₁ (maximum absolute column sum):

$||A||₁ = max_{1 ≤ j ≤ n} Σ_{i=1}^{m} |a_{ij}|
$

(Derivation: ||A||₁ = sup_{||x||₁=1} ||Ax||₁. The unit sphere in ℓ₁ is a cross-polytope; the max is attained at a standard basis vector, giving the column sum.)

||A||_∞ (maximum absolute row sum):

$||A||_∞ = max_{1 ≤ i ≤ m} Σ_{j=1}^{n} |a_{ij}|
$

(Similarly, the ℓ_∞ unit sphere is a hypercube; max at a corner where x_j = ±1.)

||A||₂ (spectral norm — largest singular value):

$||A||₂ = σ_max(A) = sqrt(λ_max(A^T A))
$

(Derivation: ||A||₂² = sup_{||x||₂=1} x^T A^T A x = λ_max(A^T A) by Rayleigh quotient. ∴ ||A||₂ = σ_max.)

Frobenius norm (NOT induced, but commonly used):

$||A||_F = sqrt(Σ_{i,j} |a_{ij}|²) = sqrt(tr(A^T A)) = sqrt(Σ σ_i²)
$

Since it's not induced, it doesn't satisfy $||AB||_F ≤ ||A||_F ||B||_F$ in general.

Properties of induced norms: - Submultiplicativity: $||AB|| ≤ ||A|| · ||B||$ - $||Ax|| ≤ ||A|| · ||x||$ for the inducing vector norm - $||I|| = 1$ (for any induced norm) - $||A^{-1}|| ≥ 1/||A||$ (but can be much larger)

3. Condition Number

The condition number of a matrix A (with respect to a norm, usually ℓ₂) is:

CRITICAL -- Foundational: Condition number kappa(A) = ||A|| ||A^{-1}|| measures error amplification. Large kappa = ill-conditioned problem. This is a PROPERTY of the problem, not the solver.

κ(A) = ||A|| · ||A^{-1}||  (if A is invertible)

For the ℓ₂-norm:

$κ₂(A) = σ_max / σ_min
$

where σ_max, σ_min are the largest and smallest singular values.

Interpretation: κ(A) measures how much relative error in the input (b in Ax=b) can be amplified in the solution x.

Perturbation analysis:

If $Ax = b$ and $(A + δA)(x + δx) = b + δb$, then:

$||δx|| / ||x|| ≤ κ(A) · (||δA||/||A|| + ||δb||/||b||)  + higher-order terms
$

Relative error in x ≤ condition number × (relative error in A + relative error in b).

Key implications: - κ(A) = 1: perfectly conditioned (e.g., orthogonal matrices) - κ(A) large: ill-conditioned — small input perturbations cause large output changes - κ(A) → ∞: singular (not invertible) - log₁₀(κ(A)) tells you roughly how many decimal digits of accuracy you lose

Example: A system with κ(A) = 10⁸ and 16-digit floating-point precision loses about 8 digits, leaving only ~8 correct digits in the solution.

4. Ill-Conditioned Systems

Classic ill-conditioned example — the Hilbert matrix:

$H_{ij} = 1/(i + j - 1),  i, j = 1, ..., n
$

For n = 5: κ₂(H₅) ≈ 4.8 × 10⁵. For n = 10: κ₂ ≈ 1.6 × 10¹³.

Geometric interpretation for 2×2: Ill-conditioning means the rows (or columns) are nearly linearly dependent. The matrix maps the unit circle to a very elongated ellipse (σ_max ≫ σ_min).

Detection: Small pivot in Gaussian elimination, nearly zero determinant (but note: determinant doesn't directly measure conditioning!), large entries in A^{-1}, or direct computation of singular values.

5. Stability vs. Conditioning

Conditioning: Property of the PROBLEM. Ill-conditioned = the exact solution is inherently sensitive to small changes in data. No algorithm can fix this (without higher precision).

Stability: Property of the ALGORITHM. A stable algorithm gives the exact solution to a slightly perturbed problem ("backward error" is small). Combined with conditioning: $forward error ≤ condition number × backward error$.

Well-conditioned + stable algorithm = accurate answer
Well-conditioned + unstable algorithm = bad answer (fix the algorithm)
Ill-conditioned + any algorithm = potentially bad answer (fix the problem formulation, use higher precision, or regularize)

Key Terms

09 05 Matrix Norms And Conditioning
09-06 Cholesky Decomposition
Answer: a.
Answer: b.
Answer: d.
End-of-Subject Quiz
Hilbert matrix
induced

Worked Examples

Example 1: Computing Induced Matrix Norms

For:

$A = [ 3  -1 ]
    [-2   4 ]
$

$||A||₁ = max(|3|+|-2|, |-1|+|4|) = max(5, 5) = 5$

$||A||_∞ = max(|3|+|-1|, |-2|+|4|) = max(4, 6) = 6$

For ||A||₂, find singular values:

$A^T A = [3 -2; -1 4][3 -1; -2 4] = [13, -11; -11, 17]
$

Eigenvalues: $(13+17 ± √((30)² - 4(13·17-121)))/2 = (30 ± √(900 - 4(100)))/2 = (30 ± √500)/2 = 15 ± 5√5 ≈ 15 ± 11.18$ So $σ_max² ≈ 26.18$, $σ_max ≈ 5.117$. $||A||₂ ≈ 5.117$.

Example 2: Condition Number

For $A = [[1000, 999], [999, 998]]$:

$A^{-1} = [[ -998,  999],
          [  999, -1000]]
$

Check: $det = 1000·998 - 999² = -1$.

$||A||_∞ = max(1999, 1997) = 1999$. $||A^{-1}||_∞ = max(1997, 1999) = 1999$. $κ_∞(A) = 1999² ≈ 4 × 10⁶$.

This matrix is very ill-conditioned! Consider solving $Ax = [1, 1]^T$: Solution: $x = A^{-1}[1,1]^T = [1, -1]^T$.

Now perturb b slightly: $b' = [1.001, 0.999]^T$. $x' = A^{-1}[1.001, 0.999]^T = [-998·1.001 + 999·0.999, 999·1.001 - 1000·0.999]^T$ $= [-998.998 + 998.001, 999.999 - 999]^T = [-0.997, 0.999]^T$.

Relative change in b: ||δb||/||b|| ≈ 0.001/1 = 10^{-3}. Relative change in x: $||x' - x||/||x|| ≈ ||[-1.997, 1.999]||/||[1,-1]|| ≈ 2.83/1.414 ≈ 2.0$. Amplification by ~2000x — consistent with κ ≈ 4×10⁶ (order of magnitude).

Example 3: Orthogonal Matrices Have κ₂ = 1

For orthogonal Q: $Q^T Q = I$. Singular values of Q are all 1. So $κ₂(Q) = σ_max/σ_min = 1/1 = 1$. Orthogonal transformations are perfectly conditioned — they don't amplify errors.

Quiz

Q1: What does the concept of End-of-Subject Quiz primarily refer to in this subject?

A) The definition and application of End-of-Subject Quiz B) A visual representation of End-of-Subject Quiz C) A historical anecdote about End-of-Subject Quiz D) A computational error related to End-of-Subject Quiz

Correct: A)

If you chose A: End-of-Subject Quiz is defined as: the definition and application of end-of-subject quiz. The other options describe different aspects that are not the primary focus. Correct!
If you chose B: This is incorrect. End-of-Subject Quiz is defined as: the definition and application of end-of-subject quiz. The other options describe different aspects that are not the primary focus.
If you chose C: This is incorrect. End-of-Subject Quiz is defined as: the definition and application of end-of-subject quiz. The other options describe different aspects that are not the primary focus.
If you chose D: This is incorrect. End-of-Subject Quiz is defined as: the definition and application of end-of-subject quiz. The other options describe different aspects that are not the primary focus.

Q2: What is the primary purpose of Hilbert matrix?

A) It replaces all other methods in this domain B) It is primarily a historical notation system C) It is used to hilbert matrix in mathematical analysis D) It is used only in advanced research contexts

Correct: C)

If you chose A: This is incorrect. Hilbert matrix serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose B: This is incorrect. Hilbert matrix serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose C: Hilbert matrix serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
If you chose D: This is incorrect. Hilbert matrix serves the purpose described in the correct answer. The other options misrepresent its role.

Q3: Which statement about Common Pitfalls is TRUE?

A) Common Pitfalls is not related to this subject B) Common Pitfalls is mentioned only as a historical footnote C) Common Pitfalls is an advanced topic beyond this subject's scope D) Common Pitfalls is a fundamental concept covered in this subject

Correct: D)

If you chose A: This is incorrect. Common Pitfalls is a fundamental concept covered in this subject. This subject covers Common Pitfalls as part of its core content.
If you chose B: This is incorrect. Common Pitfalls is a fundamental concept covered in this subject. This subject covers Common Pitfalls as part of its core content.
If you chose C: This is incorrect. Common Pitfalls is a fundamental concept covered in this subject. This subject covers Common Pitfalls as part of its core content.
If you chose D: Common Pitfalls is a fundamental concept covered in this subject. This subject covers Common Pitfalls as part of its core content. Correct!

Q4: Based on the worked examples in this subject, what is the correct result?

A) An unrelated numerical value B) A different result from a common mistake C) a.** D) The inverse of the correct answer

Correct: C)

If you chose A: This is incorrect. The worked examples show that the result is a.**. The other options represent common errors.
If you chose B: This is incorrect. The worked examples show that the result is a.**. The other options represent common errors.
If you chose C: The worked examples show that the result is a.**. The other options represent common errors. Correct!
If you chose D: This is incorrect. The worked examples show that the result is a.**. The other options represent common errors.

Practice Problems

Compute ||x||₁, ||x||₂, and $||x||_∞$ for $x = [3, -4, 0, 2]^T$.
For $A = [[2, 0], [0, 3]]$, compute ||A||₁, $||A||_∞$, ||A||₂, and ||A||_F.
Compute $κ_∞(A)$ for $A = [[1, 2], [3, 4]]$.
Show that for any induced matrix norm, $κ(A) ≥ 1$.
The Vandermonde matrix $V = [[1, a], [1, b]]$ becomes ill-conditioned as a → b. Compute κ₂(V) in terms of a and b (assuming a, b > 0).
Prove that $||AB||_F ≤ ||A||_F ||B||_F$ does NOT hold in general by finding a counterexample with 2×2 matrices.

Answers

1. ||x||₁ = 3+4+0+2 = 9. ||x||₂ = √(9+16+0+4) = √29 ≈ 5.385. ||x||_∞ = max(3,4,0,2) = 4. 2. ||A||₁ = max(2, 3) = 3. ||A||_∞ = max(2, 3) = 3. ||A||₂ = max singular value = 3. ||A||_F = √(4+0+0+9) = √13 ≈ 3.606. 3. A^{-1} = (-1/2)[[4,-2],[-3,1]] = [[-2,1],[1.5,-0.5]]. ||A||_∞ = max(1+2, 3+4) = 7. ||A^{-1}||_∞ = max(2+1, 1.5+0.5) = 3. κ_∞(A) = 7 × 3 = 21. 4. 1 = ||I|| = ||A A^{-1}|| ≤ ||A|| · ||A^{-1}|| = κ(A). So κ(A) ≥ 1 for any induced norm. 5. V = [[1, a], [1, b]]. V^{-1} = 1/(b-a) [[b, -a], [-1, 1]]. ||V||₂ = σ_max(V). V^T V = [[2, a+b], [a+b, a²+b²]]. As a→b, the smaller eigenvalue → 0, so σ_min → 0, κ₂ → ∞. More precisely, for a=b: V is singular (κ₂ = ∞). 6. Let A = [[1,0],[0,0]], B = [[0,0],[0,1]]. ||A||_F = 1, ||B||_F = 1, but AB = 0, ||AB||_F = 0 ≤ 1·1 = 1. This actually satisfies the inequality. Let A = B = [[1,1],[1,1]]. ||A||_F = 2. AB = [[2,2],[2,2]], ||AB||_F = 4. But ||A||_F · ||B||_F = 2·2 = 4. Still holds. Actually, the Frobenius norm DOES satisfy submultiplicativity (it's a special case by Cauchy-Schwarz applied to each entry). A correct counterexample would require non-submultiplicative norm. Let me try the entrywise max norm: ||A|| = max|a_{ij}|. A = B = [[1,1],[1,1]]: ||A||=||B||=1, ||AB||=2, so 2 ≰ 1·1. This is not induced and submultiplicativity fails.

Summary

Vector norms (ℓ₁, ℓ₂, ℓ_∞) measure vector magnitude and are all equivalent in finite dimensions
Induced matrix norms measure maximum stretch: ||A|| = sup_{||x||=1} ||Ax||. Key formulas: ||A||₁ = max column sum, ||A||_∞ = max row sum, ||A||₂ = σ_max
The condition number κ(A) = ||A||·||A^{-1}|| quantifies sensitivity: small input perturbations can be amplified by factor κ(A)
Ill-conditioned matrices (large κ) correspond to near-linear dependence; the Hilbert matrix is the classic example
Conditioning is a problem property (can't be fixed by algorithms); stability is an algorithm property — a stable algorithm on an ill-conditioned problem still gives large errors

Pitfalls

Confusing different norms for condition number. κ₁, κ₂, and κ_∞ give different values — κ₂ uses the spectral norm while κ₁ and κ_∞ can be much larger. Always specify which norm you are using.
Assuming small determinant means ill-conditioned. det(αI) = α^n can be arbitrarily small while κ = 1. Conditioning measures the spread of singular values, not the product.
Treating condition number as an algorithm property. Conditioning is intrinsic to the problem (the matrix). A stable algorithm on an ill-conditioned problem still produces large forward errors.
Attempting to fix ill-conditioning by scaling. κ(cA) = κ(A) for any scalar c ≠ 0. Uniform scaling does nothing. Non-uniform (diagonal) scaling is the idea behind preconditioning.
Computing κ by explicitly inverting A. Forming A^{-1} is expensive and numerically unstable. Use singular value decomposition or condition number estimators (e.g., numpy.linalg.cond) instead.

Next Steps

Continue to 09-06 Cholesky Decomposition for the efficient factorization of symmetric positive definite matrices.

Progress

Phases