Math graphic
📐 Concept diagram

14-06 — Convex Sets and Functions

Phase: Optimization | Subject: 14-06 Prerequisites: 14-01-optimization-fundamentals.md, 06-01-functions-of-several-variables.md Next subject: 14-07-convex-optimization-problems.md


Learning Objectives

By the end of this subject, you will be able to:

  1. Define convex sets and prove a set is convex from the definition
  2. Define convex functions and test convexity using first-order and second-order conditions
  3. Recognize operations that preserve convexity (intersection, affine composition, pointwise maximum)
  4. Explain why convexity is the central property in optimization theory
  5. Identify common convex and non-convex functions in ML

Core Content

Convex Sets

A set $C \subseteq \mathbb{R}^n$ is convex if for any $\mathbf{x}, \mathbf{y} \in C$ and any $\lambda \in [0, 1]$:

$$\lambda\mathbf{x} + (1-\lambda)\mathbf{y} \in C$$

In plain language: the line segment between any two points in $C$ is entirely contained in $C$.

⚠️ CRITICAL — This is the fundamental building block of optimization theory. Every guarantee in convex optimization — global optimality, zero duality gap, convergence proofs — traces back to this definition.

Examples of convex sets: - $\mathbb{R}^n$ (the whole space) - ${\mathbf{x} : |\mathbf{x}| \leq 1}$ (the unit ball) - ${\mathbf{x} : A\mathbf{x} = \mathbf{b}}$ (affine subspace — a line, plane, hyperplane) - ${\mathbf{x} : A\mathbf{x} \leq \mathbf{b}}$ (polyhedron — intersection of halfspaces) - $\mathbb{S}^n_+ = {X \in \mathbb{R}^{n \times n} : X = X^T, X \succeq 0}$ (positive semidefinite cone)

Non-convex sets: - ${0, 1}^n$ (binary hypercube — discrete, not continuous) - ${\mathbf{x} : |\mathbf{x}| = 1}$ (sphere surface — the chord goes inside, not on the surface) - ${\mathbf{x} : x_1x_2 \geq 1}$ (non-convex region defined by a non-convex inequality)

Proving a Set Is Convex

Take any $\mathbf{x}, \mathbf{y} \in C$ and $\lambda \in [0,1]$. Show $\lambda\mathbf{x} + (1-\lambda)\mathbf{y} \in C$ using the definition of $C$.

Example: Prove $C = {\mathbf{x} : |\mathbf{x}| \leq 1}$ is convex. Take $\mathbf{x}, \mathbf{y}$ with $|\mathbf{x}| \leq 1$, $|\mathbf{y}| \leq 1$. For $\lambda \in [0,1]$: $|\lambda\mathbf{x} + (1-\lambda)\mathbf{y}| \leq \lambda|\mathbf{x}| + (1-\lambda)|\mathbf{y}| \leq \lambda(1) + (1-\lambda)(1) = 1$ ✓

Convex Functions

A function $f: \mathbb{R}^n \to \mathbb{R}$ is convex if $\operatorname{dom} f$ is convex and for all $\mathbf{x}, \mathbf{y} \in \operatorname{dom} f$, $\lambda \in [0,1]$:

$$f(\lambda\mathbf{x} + (1-\lambda)\mathbf{y}) \leq \lambda f(\mathbf{x}) + (1-\lambda)f(\mathbf{y})$$

Geometric interpretation: the function lies below the chord connecting any two points.

$f$ is strictly convex if the inequality is strict ($<$) for $\mathbf{x} \neq \mathbf{y}$, $\lambda \in (0,1)$.

$f$ is concave if $-f$ is convex (function lies above the chord).

First-Order Condition (Gradient Characterization)

For differentiable $f$:

$$f(\mathbf{y}) \geq f(\mathbf{x}) + \nabla f(\mathbf{x})^T(\mathbf{y} - \mathbf{x}) \quad \forall \mathbf{x}, \mathbf{y}$$

The first-order Taylor approximation is a global underestimator of $f$. The tangent plane lies below the function everywhere.

⚠️ CRITICAL — This is how you prove optimality. If $\nabla f(\mathbf{x}^) = \mathbf{0}$, then $f(\mathbf{y}) \geq f(\mathbf{x}^)$ for all $\mathbf{y}$ — $\mathbf{x}^*$ is a global minimum. For convex functions, vanishing gradient $\implies$ global optimum.

Second-Order Condition (Hessian Characterization)

For twice-differentiable $f$:

$$f \text{ is convex } \iff \nabla^2 f(\mathbf{x}) \succeq 0 \quad \forall \mathbf{x} \in \operatorname{dom} f$$

The Hessian is positive semidefinite everywhere. This is usually the easiest way to test convexity — check eigenvalues.

$f$ is strictly convex if $\nabla^2 f(\mathbf{x}) \succ 0$ (PD), but this is sufficient, not necessary ($f(x)=x^4$ is strictly convex but $f''(0)=0$).

Operations That Preserve Convexity

Operation Rule
Nonnegative weighted sum If $f_i$ are convex and $w_i \geq 0$, then $\sum_i w_i f_i$ is convex
Affine composition If $f$ is convex, then $g(\mathbf{x}) = f(A\mathbf{x} + \mathbf{b})$ is convex
Pointwise maximum If $f_1, \ldots, f_m$ are convex, then $f(\mathbf{x}) = \max_i f_i(\mathbf{x})$ is convex
Composition with convex nondecreasing If $g$ is convex nondecreasing and $h$ is convex, $g \circ h$ is convex
Partial infimum If $f(\mathbf{x}, \mathbf{y})$ is jointly convex, then $g(\mathbf{x}) = \inf_{\mathbf{y}} f(\mathbf{x}, \mathbf{y})$ is convex

⚠️ Common Pitfall: The sum of convex functions is convex, but the product of convex functions is generally NOT convex. $f(x)=x^2$ and $g(x)=x^2$ are convex — their product $h(x)=x^4$ happens to also be convex, but that's coincidence, not a product rule. For a genuine counterexample: $f(x)=x^2$ and $g(x)=(x-1)^2$ are both convex, but their product $h(x)=x^2(x-1)^2$ is non-convex (its second derivative changes sign). Similarly, $f(x)=e^x$ and $g(x)=e^x$ are convex and $e^{2x}$ is convex, but again this is luck — not a general property.

ML Functions and Their Convexity Status

Function Convex? Notes
$|X\mathbf{w} - \mathbf{y}|_2^2$ (MSE loss) Quadratic with PSD Hessian $2X^TX$
$-\log p(y \mathbf{x};\mathbf{w})$ (NLL/cross-entropy) ✓ (in $\mathbf{w}$ for linear models)
$| \mathbf{w}|_1$ (L1 regularization) Convex, nonsmooth at 0
$| \mathbf{w}|_2^2$ (L2 regularization) Strongly convex
Neural network loss (in all params) Non-convex composition of convex functions
$\min_i f_i(\mathbf{x})$ (min of convex) Min breaks convexity; max preserves it


Key Terms

Worked Examples

Example 1: Proving a Function Is Convex

Prove $f(x) = -\ln x$ is convex on $(0, \infty)$.

Solution (second-order): $f'(x) = -1/x$, $f''(x) = 1/x^2 > 0$ for all $x > 0$. Since $f''(x) \geq 0$, $f$ is convex (in fact, strictly convex). ✓

Solution (definition): For $x, y > 0$, $\lambda \in [0,1]$: Need to show $-\ln(\lambda x + (1-\lambda)y) \leq -\lambda\ln x - (1-\lambda)\ln y$. Multiply by $-1$ and exponentiate: $\lambda x + (1-\lambda)y \geq x^\lambda y^{1-\lambda}$ — this is the weighted AM-GM inequality, which holds for all $x,y>0$. ✓

Click for answer $f''(x) = 1/x^2 \geq 0$ for all $x > 0$, proving convexity. Alternatively, follows from the weighted AM-GM inequality.

Example 2: Pointwise Maximum

Show that $f(\mathbf{x}) = \max{x_1^2 + x_2^2,\; (x_1-1)^2 + x_2^2}$ is convex.

Solution: Both $f_1(\mathbf{x}) = x_1^2 + x_2^2$ and $f_2(\mathbf{x}) = (x_1-1)^2 + x_2^2$ are convex quadratics (Hessian of each is $2I \succeq 0$). The pointwise maximum of convex functions is convex. ✓

Geometrically, $f$ is the maximum of two paraboloids centered at $(0,0)$ and $(1,0)$ — the function looks like two bowls glued together.

Click for answer Both components are convex (Hessian $= 2I$ for each). Pointwise maximum preserves convexity.

Example 3: Intersection of Convex Sets

Prove the feasible set of $x^2 + y^2 \leq 1$, $x \geq 0$ is convex.

Solution: $C_1 = {(x,y) : x^2+y^2 \leq 1}$ is convex (unit disk — proved earlier). $C_2 = {(x,y) : x \geq 0}$ is convex (halfspace). $C = C_1 \cap C_2$ — intersection of convex sets is convex. This is the right half of the unit disk. ✓

Click for answer The intersection of convex sets is convex. $C_1$ (unit disk) and $C_2$ (halfspace $x \geq 0$) are both convex, so their intersection (right half-disk) is convex.


Quiz

Q1: What does the concept of Nonnegative weighted sum primarily refer to in this subject?

A) A computational error related to Nonnegative weighted sum B) A historical anecdote about Nonnegative weighted sum C) A visual representation of Nonnegative weighted sum D) The definition and application of Nonnegative weighted sum

Correct: D)

Q2: Which of the following is the key formula discussed in this subject?

A) A simplified version of C \subseteq \mathbb{R}^n... B) An unrelated formula from a different topic C) The inverse operation of the formula in question D) C \subseteq \mathbb{R}^n

Correct: D)

Q3: What is the primary purpose of Affine composition?

A) It replaces all other methods in this domain B) It is used only in advanced research contexts C) It is used to affine composition in mathematical analysis D) It is primarily a historical notation system

Correct: C)

Q4: Which statement about Pointwise maximum is TRUE?

A) Pointwise maximum is mentioned only as a historical footnote B) Pointwise maximum is not related to this subject C) Pointwise maximum is an advanced topic beyond this subject's scope D) Pointwise maximum is a fundamental concept covered in this subject

Correct: D)

Q5: Based on the worked examples in this subject, what is the correct result?

A) The inverse of the correct answer B) An unrelated numerical value C) Pointwise Maximum D) A different result from a common mistake

Correct: C)

Q6: How are Pointwise maximum and Composition with convex nondecreasing related?

A) Pointwise maximum is a special case of Composition with convex nondecreasing B) Pointwise maximum is the inverse of Composition with convex nondecreasing C) Pointwise maximum and Composition with convex nondecreasing are closely related concepts D) Pointwise maximum and Composition with convex nondecreasing are completely unrelated topics

Correct: C)

Q7: What is a common pitfall when working with Partial infimum?

A) Partial infimum has no common misconceptions B) Partial infimum is always computed the same way in all contexts C) The main error with Partial infimum is using it when it is not needed D) A common mistake is confusing Partial infimum with a similar concept

Correct: D)

Q8: When should you apply Convex Sets?

A) Use Convex Sets only in pure mathematics contexts B) Avoid Convex Sets unless explicitly instructed C) Convex Sets is not practically useful D) Apply Convex Sets to solve problems in this subject's domain

Correct: D)

Practice Problems

  1. Prove that the set of probability vectors $\Delta_n = {\mathbf{p} \in \mathbb{R}^n : p_i \geq 0,\; \sum_i p_i = 1}$ is convex.

    Click for answer Take $\mathbf{p}, \mathbf{q} \in \Delta_n$, $\lambda \in [0,1]$. Let $\mathbf{r} = \lambda\mathbf{p} + (1-\lambda)\mathbf{q}$. $r_i = \lambda p_i + (1-\lambda)q_i \geq 0$ (nonnegative combination of nonnegative numbers). $\sum_i r_i = \lambda\sum_i p_i + (1-\lambda)\sum_i q_i = \lambda(1) + (1-\lambda)(1) = 1$. So $\mathbf{r} \in \Delta_n$ ✓ — this is the probability simplex.

  2. Is $f(x) = |x|$ convex? Prove it using the definition.

    Click for answer For $x, y \in \mathbb{R}$, $\lambda \in [0,1]$: $f(\lambda x + (1-\lambda)y) = |\lambda x + (1-\lambda)y|$ $\leq \lambda|x| + (1-\lambda)|y|$ (triangle inequality) $= \lambda f(x) + (1-\lambda)f(y)$ ✓ $|x|$ is convex (and subdifferentiable at 0 with subgradient in $[-1, 1]$).

  3. Determine if $f(x, y) = x^2/y$ is convex on ${(x, y) : y > 0}$.

    Click for answer Hessian: $\nabla^2 f = \begin{pmatrix} 2/y & -2x/y^2 \\ -2x/y^2 & 2x^2/y^3 \end{pmatrix} = \frac{2}{y^3}\begin{pmatrix} y^2 & -xy \\ -xy & x^2 \end{pmatrix}$. The matrix $\begin{pmatrix} y^2 & -xy \\ -xy & x^2 \end{pmatrix}$ has eigenvalues $0$ and $x^2+y^2 \geq 0$, so the Hessian is PSD for $y>0$. The function IS convex on this domain. (This is the "quadratic-over-linear" function, a standard convex function in Boyd & Vandenberghe.)

  4. Show that the set of PSD matrices $\mathbb{S}^n_+$ is a convex cone.

    Click for answer Take $A, B \succeq 0$, $\lambda \in [0,1]$. For any $\mathbf{v}$: $\mathbf{v}^T(\lambda A + (1-\lambda)B)\mathbf{v} = \lambda\mathbf{v}^TA\mathbf{v} + (1-\lambda)\mathbf{v}^TB\mathbf{v} \geq 0$ since both terms are nonnegative. So $\lambda A + (1-\lambda)B \succeq 0$, hence $\mathbb{S}^n_+$ is convex. Also, for $\alpha \geq 0$, $\alpha A \succeq 0$ (cone property).

  5. $f(x) = e^x$ and $g(x) = x^2$ are convex. Is $h(x) = f(g(x)) = e^{x^2}$ convex on $\mathbb{R}$?

    Click for answer $h'(x) = 2x e^{x^2}$, $h''(x) = 2e^{x^2} + 4x^2 e^{x^2} = 2e^{x^2}(1 + 2x^2) \geq 0$ for all $x$. Yes, $h$ is convex. This follows from the composition rule: $f$ is convex and nondecreasing on $[0,\infty)$, and $g(x) = x^2$ maps into $[0,\infty)$ and is convex — so $f \circ g$ is convex.


Summary

Key takeaways:


Pitfalls

  1. Assuming the product of convex functions is convex. $f(x)=x^2$ and $g(x)=(x-1)^2$ are both convex, but their product is non-convex (second derivative changes sign). There is no product rule for convexity — only nonnegative weighted sums, affine composition, and pointwise maximum are safe.

  2. Using $\min$ instead of $\max$ when combining convex functions. The pointwise maximum of convex functions is convex; the pointwise minimum is generally not. For example, $\min{x^2, (x-1)^2}$ has a dip in the middle and is non-convex.

  3. Checking convexity only at a single point. A function with $f''(x) > 0$ at one point is not necessarily convex. $f''(x)$ must be $\geq 0$ everywhere in the domain — one region of negative curvature destroys convexity.

  4. Confusing strict convexity ($f'' > 0$) with convexity ($f'' \geq 0$). $f(x)=x^4$ is strictly convex but $f''(0)=0$. Strict convexity does not require $f''>0$ everywhere — it's a sufficient but not necessary condition.

  5. Forgetting that convexity is domain-dependent. $f(x)=1/x$ is convex on $(0,\infty)$ but not on $\mathbb{R}\setminus{0}$ (it's concave on $(-\infty,0)$). Always specify the domain when claiming convexity.



Next Steps

Next up: 14-07-convex-optimization-problems.md — standard forms (LP, QP, SOCP, SDP), optimality conditions, and why convex problems are "easy."