📐 Concept diagram

14-07 — Convex Optimization Problems

Phase: Optimization | Subject: 14-07 Prerequisites: 14-06-convex-sets-functions.md Next subject: 14-08-constrained-optimization-theory.md

Learning Objectives

By the end of this subject, you will be able to:

Formulate an optimization problem in standard convex form
Classify problems as LP, QP, QCQP, SOCP, or SDP
Derive and apply the KKT conditions for convex problems
Explain why every local minimum of a convex problem is global
Recognize when a non-convex problem can be reformulated as convex

Core Content

Standard Form of a Convex Optimization Problem

$$\min_{\mathbf{x}} f_0(\mathbf{x}) \quad \text{s.t.} \quad f_i(\mathbf{x}) \leq 0,\; i = 1,\ldots,m,\quad A\mathbf{x} = \mathbf{b}$$

where $f_0, f_1, \ldots, f_m$ are convex functions. The equality constraints must be affine ($A\mathbf{x} = \mathbf{b}$) — not arbitrary equalities (a convex equality $h(\mathbf{x})=0$ with $h$ convex would require both $h(\mathbf{x}) \leq 0$ and $-h(\mathbf{x}) \leq 0$, forcing $h$ to be affine).

⚠️ CRITICAL: The equality constraints in a convex problem are ALWAYS affine. An equality constraint like $|\mathbf{x}|^2 = 1$ is NOT convex because the set ${\mathbf{x}: |\mathbf{x}|^2 = 1}$ (sphere surface) is non-convex.

The Hierarchy of Convex Problems

Problem Class	Form	ML Example
LP (Linear Program)	$\min \mathbf{c}^T\mathbf{x}$ s.t. $A\mathbf{x} \leq \mathbf{b}$	Resource allocation, optimal transport
QP (Quadratic Program)	$\min \frac{1}{2}\mathbf{x}^TQ\mathbf{x} + \mathbf{c}^T\mathbf{x}$ s.t. $A\mathbf{x} \leq \mathbf{b}$, $Q \succeq 0$	SVM primal, L2-regularized linear regression, Markowitz portfolio
QCQP	QP with quadratic constraints	---
SOCP (Second-Order Cone)	$\min \mathbf{c}^T\mathbf{x}$ s.t. $\|A_i\mathbf{x}+\mathbf{b}_i\| \leq \mathbf{c}_i^T\mathbf{x} + d_i$	Robust linear regression
SDP (Semidefinite Program)	$\min \operatorname{tr}(CX)$ s.t. $\operatorname{tr}(A_iX) = b_i$, $X \succeq 0$	Matrix completion, Max-Cut relaxation

Each successive class in the hierarchy is more expressive than the previous, but also more expensive to solve.

The Fundamental Theorem of Convex Optimization

Any local minimum of a convex optimization problem is a global minimum.

Proof: Suppose $\mathbf{x}^$ is a local minimum but not global. Then there exists feasible $\mathbf{y}$ with $f_0(\mathbf{y}) < f_0(\mathbf{x}^)$. By convexity of the feasible set, $\mathbf{z}_\lambda = \lambda\mathbf{y} + (1-\lambda)\mathbf{x}^*$ is feasible for all $\lambda \in [0,1]$. By convexity of $f_0$:

$f_0(\mathbf{z}_\lambda) \leq \lambda f_0(\mathbf{y}) + (1-\lambda)f_0(\mathbf{x}^) < f_0(\mathbf{x}^)$

For sufficiently small $\lambda$, $\mathbf{z}_\lambda$ is arbitrarily close to $\mathbf{x}^*$ (contradicting local optimality). ∎

KKT Conditions for Convex Problems

For a convex problem with differentiable $f_i$, a point $\mathbf{x}^$ is globally optimal if and only if there exist $\boldsymbol{\lambda}^ \geq \mathbf{0}$ and $\boldsymbol{\nu}^*$ satisfying:

Primal feasibility: $f_i(\mathbf{x}^) \leq 0$, $A\mathbf{x}^ = \mathbf{b}$
Dual feasibility: $\lambda_i^* \geq 0$
Stationarity: $\nabla f_0(\mathbf{x}^) + \sum_i \lambda_i^ \nabla f_i(\mathbf{x}^) + A^T\boldsymbol{\nu}^ = \mathbf{0}$
Complementary slackness: $\lambda_i^ f_i(\mathbf{x}^) = 0$ for all $i$

⚠️ CRITICAL — For convex problems, KKT is necessary AND sufficient. For non-convex problems, KKT is only necessary — a point can satisfy KKT and still not be optimal (it could be a saddle or local max).

Complementary slackness is the most important condition: for each inequality constraint, either $\lambda_i^ = 0$ (constraint inactive) or $f_i(\mathbf{x}^) = 0$ (constraint active/binding).

LP Example: Optimal Transport

The Kantorovich optimal transport problem (used in Wasserstein distances for ML):

$$\min_{P \in \mathbb{R}^{m \times n}} \sum_{i,j} P_{ij} C_{ij} \quad \text{s.t.} \quad P\mathbf{1} = \mathbf{a},\; P^T\mathbf{1} = \mathbf{b},\; P \geq 0$$

This is an LP with $mn$ variables, $m+n$ equality constraints. It's convex and solvable in polynomial time (though Sinkhorn regularization is preferred for large-scale ML).

SVM as a QP

The hard-margin SVM primal:

$$\min_{\mathbf{w}, b} \frac{1}{2}|\mathbf{w}|^2 \quad \text{s.t.} \quad y_i(\mathbf{w}^T\mathbf{x}_i + b) \geq 1,\; i=1,\ldots,N$$

This is a QP: quadratic objective ($Q = I \succeq 0$) with linear inequality constraints. Convex → global optimum, solved efficiently by specialized algorithms (SMO) or general QP solvers.

Key Terms

QCQP
SOCP

Worked Examples

Example 1: Solving a Simple LP

Solve: $\min_{x,y} -x - y$ s.t. $x + y \leq 2$, $x \geq 0$, $y \geq 0$.

Solution (geometric): The LP solution: At $(2,0)$: $f=-2$. At $(0,2)$: $f=-2$. At $(1,1)$: $f=-2$. The entire segment $x+y=2$ with $x,y \geq 0$ gives the same optimal value of $-2$. Any point on this edge is optimal.

This illustrates that LP solutions can be non-unique — any convex combination of optimal vertices is also optimal.

Click for answer

Optimal value: $-2$. All points on the edge $x+y=2$, $x,y \geq 0$ are optimal (infinitely many solutions).

Example 2: KKT for a QP

Consider $\min x^2$ s.t. $x \geq 1$. Apply KKT.

Solution: $f_0(x) = x^2$, $f_1(x) = 1 - x \leq 0$ (i.e., $x \geq 1$). $\nabla f_0 = 2x$, $\nabla f_1 = -1$.

KKT stationarity: $2x + \lambda(-1) = 0 \implies 2x = \lambda$. Complementary slackness: $\lambda(1-x) = 0$. Dual feasibility: $\lambda \geq 0$.

Case 1: $\lambda = 0 \implies 2x = 0 \implies x = 0$. But $1-0 \leq 0$ → $1 \leq 0$ ✗ (primal infeasible). Case 2: $1-x = 0 \implies x = 1$, then $\lambda = 2x = 2 \geq 0$ ✓.

Solution: $x^ = 1$, $\lambda^ = 2$, $f_0(x^*) = 1$.

This makes sense: the unconstrained minimum of $x^2$ is at $x=0$, but the constraint $x \geq 1$ pushes it to the boundary.

Click for answer

$x^* = 1$, $\lambda^* = 2$, $f(x^*) = 1$. The constraint is active — the optimum lies on the boundary.

Example 3: Convex Reformulation

The problem $\min |A\mathbf{x} - \mathbf{b}|_2$ (unconstrained) is convex. But what about $\min |A\mathbf{x} - \mathbf{b}|_2 + \lambda|\mathbf{x}|_0$?

The $\ell_0$ norm (count of nonzeros) makes this non-convex. But the convex relaxation replaces $|\mathbf{x}|_0$ with $|\mathbf{x}|_1$:

$$\min |A\mathbf{x} - \mathbf{b}|_2^2 + \lambda|\mathbf{x}|_1$$

This is the LASSO — a convex problem (sum of convex functions) that under certain conditions recovers the true sparse solution. This is the foundational idea behind compressed sensing and sparse recovery.

Click for answer

Replace $\|\mathbf{x}\|_0$ with $\|\mathbf{x}\|_1$ — the tightest convex relaxation of the cardinality function. This yields the LASSO, a convex QP variant.

Quiz

Q1: What does the concept of Standard Form Of A Convex Optimization Problem primarily refer to in this subject?

A) The definition and application of Standard Form Of A Convex Optimization Problem B) A visual representation of Standard Form Of A Convex Optimization Problem C) A computational error related to Standard Form Of A Convex Optimization Problem D) A historical anecdote about Standard Form Of A Convex Optimization Problem

Correct: A)

If you chose A: Standard Form Of A Convex Optimization Problem is defined as: the definition and application of standard form of a convex optimization problem. The other options describe different aspects that are not the primary focus. Correct!
If you chose B: This is incorrect. Standard Form Of A Convex Optimization Problem is defined as: the definition and application of standard form of a convex optimization problem. The other options describe different aspects that are not the primary focus.
If you chose C: This is incorrect. Standard Form Of A Convex Optimization Problem is defined as: the definition and application of standard form of a convex optimization problem. The other options describe different aspects that are not the primary focus.
If you chose D: This is incorrect. Standard Form Of A Convex Optimization Problem is defined as: the definition and application of standard form of a convex optimization problem. The other options describe different aspects that are not the primary focus.

Q2: Which of the following is the key formula discussed in this subject?

A) An unrelated formula from a different topic B) The inverse operation of the formula in question C) \min_{\mathbf{x}} f_0(\mathbf{x}) \quad \text{s.t.} \quad f_i(\mathbf{x}) \leq 0,\; i = 1,\ldots,m,\quad A\mathbf{x} = \mathbf{b} D) A simplified version of \min_{\mathbf{x}} f_0(\math...

Correct: C)

If you chose A: This is incorrect. The formula \min_{\mathbf{x}} f_0(\mathbf{x}) \quad \text{s.t.} \quad f_i(\mathbf{x}) \leq 0,\; i = 1,\ldots,m,\quad A\mathbf{x} = \mathbf{b} is central to this subject. The other options are either simplified versions or unrelated.
If you chose B: This is incorrect. The formula \min_{\mathbf{x}} f_0(\mathbf{x}) \quad \text{s.t.} \quad f_i(\mathbf{x}) \leq 0,\; i = 1,\ldots,m,\quad A\mathbf{x} = \mathbf{b} is central to this subject. The other options are either simplified versions or unrelated.
If you chose C: The formula \min_{\mathbf{x}} f_0(\mathbf{x}) \quad \text{s.t.} \quad f_i(\mathbf{x}) \leq 0,\; i = 1,\ldots,m,\quad A\mathbf{x} = \mathbf{b} is central to this subject. The other options are either simplified versions or unrelated. Correct!
If you chose D: This is incorrect. The formula \min_{\mathbf{x}} f_0(\mathbf{x}) \quad \text{s.t.} \quad f_i(\mathbf{x}) \leq 0,\; i = 1,\ldots,m,\quad A\mathbf{x} = \mathbf{b} is central to this subject. The other options are either simplified versions or unrelated.

Q3: What is the primary purpose of The Hierarchy Of Convex Problems?

A) It replaces all other methods in this domain B) It is used to the hierarchy of convex problems in mathematical analysis C) It is primarily a historical notation system D) It is used only in advanced research contexts

Correct: B)

If you chose A: This is incorrect. The Hierarchy Of Convex Problems serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose B: The Hierarchy Of Convex Problems serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
If you chose C: This is incorrect. The Hierarchy Of Convex Problems serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose D: This is incorrect. The Hierarchy Of Convex Problems serves the purpose described in the correct answer. The other options misrepresent its role.

Q4: Which statement about The Fundamental Theorem Of Convex Optimization is TRUE?

A) The Fundamental Theorem Of Convex Optimization is not related to this subject B) The Fundamental Theorem Of Convex Optimization is mentioned only as a historical footnote C) The Fundamental Theorem Of Convex Optimization is a fundamental concept covered in this subject D) The Fundamental Theorem Of Convex Optimization is an advanced topic beyond this subject's scope

Correct: C)

If you chose A: This is incorrect. The Fundamental Theorem Of Convex Optimization is a fundamental concept covered in this subject. This subject covers The Fundamental Theorem Of Convex Optimization as part of its core content.
If you chose B: This is incorrect. The Fundamental Theorem Of Convex Optimization is a fundamental concept covered in this subject. This subject covers The Fundamental Theorem Of Convex Optimization as part of its core content.
If you chose C: The Fundamental Theorem Of Convex Optimization is a fundamental concept covered in this subject. This subject covers The Fundamental Theorem Of Convex Optimization as part of its core content. Correct!
If you chose D: This is incorrect. The Fundamental Theorem Of Convex Optimization is a fundamental concept covered in this subject. This subject covers The Fundamental Theorem Of Convex Optimization as part of its core content.

Q5: Based on the worked examples in this subject, what is the correct result?

A) 2$, $x,y \geq 0$ are optimal (infinitely many B) The inverse of the correct answer C) An unrelated numerical value D) A different result from a common mistake

Correct: A)

If you chose A: The worked examples show that the result is 2$, $x,y \geq 0$ are optimal (infinitely many. The other options represent common errors. Correct!
If you chose B: This is incorrect. The worked examples show that the result is 2$, $x,y \geq 0$ are optimal (infinitely many. The other options represent common errors.
If you chose C: This is incorrect. The worked examples show that the result is 2$, $x,y \geq 0$ are optimal (infinitely many. The other options represent common errors.
If you chose D: This is incorrect. The worked examples show that the result is 2$, $x,y \geq 0$ are optimal (infinitely many. The other options represent common errors.

Q6: How are The Fundamental Theorem Of Convex Optimization and Kkt Conditions For Convex Problems related?

A) The Fundamental Theorem Of Convex Optimization and Kkt Conditions For Convex Problems are completely unrelated topics B) The Fundamental Theorem Of Convex Optimization and Kkt Conditions For Convex Problems are closely related concepts C) The Fundamental Theorem Of Convex Optimization is a special case of Kkt Conditions For Convex Problems D) The Fundamental Theorem Of Convex Optimization is the inverse of Kkt Conditions For Convex Problems

Correct: B)

If you chose A: This is incorrect. Both The Fundamental Theorem Of Convex Optimization and Kkt Conditions For Convex Problems are covered in this subject as interconnected topics.
If you chose B: Both The Fundamental Theorem Of Convex Optimization and Kkt Conditions For Convex Problems are covered in this subject as interconnected topics. Correct!
If you chose C: This is incorrect. Both The Fundamental Theorem Of Convex Optimization and Kkt Conditions For Convex Problems are covered in this subject as interconnected topics.
If you chose D: This is incorrect. Both The Fundamental Theorem Of Convex Optimization and Kkt Conditions For Convex Problems are covered in this subject as interconnected topics.

Q7: What is a common pitfall when working with Lp Example: Optimal Transport?

A) The main error with Lp Example: Optimal Transport is using it when it is not needed B) Lp Example: Optimal Transport has no common misconceptions C) A common mistake is confusing Lp Example: Optimal Transport with a similar concept D) Lp Example: Optimal Transport is always computed the same way in all contexts

Correct: C)

If you chose A: This is incorrect. Students often confuse Lp Example: Optimal Transport with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose B: This is incorrect. Students often confuse Lp Example: Optimal Transport with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose C: Students often confuse Lp Example: Optimal Transport with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!
If you chose D: This is incorrect. Students often confuse Lp Example: Optimal Transport with similar-sounding or related concepts. Pay attention to the precise definitions.

Q8: When should you apply Svm As A Qp?

A) Avoid Svm As A Qp unless explicitly instructed B) Apply Svm As A Qp to solve problems in this subject's domain C) Svm As A Qp is not practically useful D) Use Svm As A Qp only in pure mathematics contexts

Correct: B)

If you chose A: This is incorrect. Svm As A Qp is a practical tool used throughout this subject to solve relevant problems.
If you chose B: Svm As A Qp is a practical tool used throughout this subject to solve relevant problems. Correct!
If you chose C: This is incorrect. Svm As A Qp is a practical tool used throughout this subject to solve relevant problems.
If you chose D: This is incorrect. Svm As A Qp is a practical tool used throughout this subject to solve relevant problems.

Practice Problems

Formulate the problem of finding the point in a polyhedron closest to a given point $\mathbf{y}$ as a QP.

Click for answer
$\min_{\mathbf{x}} \|\mathbf{x} - \mathbf{y}\|_2^2$ s.t. $A\mathbf{x} \leq \mathbf{b}$. Expand: $\min \mathbf{x}^T\mathbf{x} - 2\mathbf{y}^T\mathbf{x} + \mathbf{y}^T\mathbf{y}$. The constant $\mathbf{y}^T\mathbf{y}$ doesn't affect the minimizer. This is a QP with $Q = 2I \succ 0$ (strictly convex).
Apply KKT to $\min (x-2)^2 + (y-1)^2$ s.t. $x + y \leq 1$.

Click for answer
$f_0 = (x-2)^2 + (y-1)^2$, constraint $x+y-1 \leq 0$. $\nabla f_0 = (2(x-2), 2(y-1))$. Stationarity: $2(x-2) + \lambda = 0$, $2(y-1) + \lambda = 0$. So $2(x-2) = 2(y-1) \implies x-2 = y-1 \implies x = y+1$. Complementary slackness: $\lambda(x+y-1) = 0$. If $\lambda = 0$: $x=2$, $y=1$. Check $2+1-1 = 2 > 0$ → infeasible. If $\lambda \neq 0$: $x+y=1$. With $x=y+1$: $(y+1)+y=1 \implies 2y=0 \implies y=0$, $x=1$. $\lambda = -2(x-2) = -2(1-2) = 2 \geq 0$ ✓. $x^* = 1$, $y^* = 0$, $f = (1-2)^2 + (0-1)^2 = 2$.
Is $\min |A\mathbf{x} - \mathbf{b}|_2^2$ s.t. $|\mathbf{x}|_2 \leq 1$ convex? If so, what class?

Click for answer
Yes — minimize a convex quadratic subject to a convex quadratic constraint $\|\mathbf{x}\|_2^2 - 1 \leq 0$ (the Hessian of the constraint function is $2I \succeq 0$). This is a QCQP (Quadratically Constrained Quadratic Program). This specific problem is also known as ridge regression with a constraint (as opposed to a penalty) — also called "trust-region subproblem."
Explain why $\min e^x$ is convex but $\min e^x$ s.t. $x^2 = 1$ is not a convex problem.

Click for answer
$e^x$ is convex. The constraint $x^2 = 1$ defines the set $\{-1, 1\}$, which is not convex (and is not affine). Equality constraints in convex problems must be affine ($A\mathbf{x} = \mathbf{b}$). The constraint $x^2 = 1$ is equivalent to $x^2 - 1 \leq 0$ AND $1 - x^2 \leq 0$, but the second constraint has $f_2(x) = 1-x^2$ which is NOT convex (concave). So this cannot be written as a convex problem.
For the LP $\min \mathbf{c}^T\mathbf{x}$ s.t. $A\mathbf{x} \leq \mathbf{b}$, $\mathbf{x} \geq 0$, write the KKT conditions and interpret complementary slackness.

Click for answer
With $\boldsymbol{\lambda} \geq 0$ for $A\mathbf{x} \leq \mathbf{b}$ and $\boldsymbol{\mu} \geq 0$ for $-\mathbf{x} \leq 0$: Stationarity: $\mathbf{c} + A^T\boldsymbol{\lambda} - \boldsymbol{\mu} = 0 \implies \boldsymbol{\mu} = \mathbf{c} + A^T\boldsymbol{\lambda}$. Complementary slackness: $\lambda_i(\mathbf{a}_i^T\mathbf{x} - b_i) = 0$, $\mu_i x_i = 0$. Interpretation: If $\mu_i > 0$, then $x_i = 0$ (variable at bound). If $x_i > 0$, then $\mu_i = 0$ → $(c + A^T\lambda)_i = 0$ → reduced cost is zero.

Summary

Key takeaways:

A convex problem minimizes a convex function over a convex set with affine equalities
Any local minimum is global — this is the defining advantage of convex optimization
Problem hierarchy: LP ⊂ QP ⊂ QCQP ⊂ SOCP ⊂ SDP — each is a superset of the previous
KKT conditions are necessary AND sufficient for convex problems
Complementary slackness: $\lambda_i^ f_i(\mathbf{x}^) = 0$ — either constraint is active or multiplier is zero
Many non-convex ML problems have convex relaxations (e.g., $\ell_0 \to \ell_1$, nuclear norm for rank)

Pitfalls

Writing non-affine equality constraints in a convex problem. $x^2 = 1$ defines the non-convex set ${-1, 1}$ and is equivalent to $x^2 \leq 1$ AND $-x^2 \leq 0$, but $-x^2$ is concave — not allowed. Equality constraints in convex optimization must be affine ($A\mathbf{x} = \mathbf{b}$).
Assuming KKT conditions are sufficient for non-convex problems. KKT is only necessary for general problems (under constraint qualifications). A saddle point or local maximum can satisfy KKT. For convex problems only, KKT is both necessary and sufficient.
Misclassifying a problem as convex. Having a convex objective is not enough — inequality constraint functions must also be convex, and equality constraints must be affine. A single non-convex constraint makes the whole problem non-convex.
Forgetting that LP solutions can be non-unique. When the objective is parallel to a constraint boundary, an entire edge or face may be optimal. Don't assume uniqueness unless the problem is strictly convex.
Confusing $\ell_1$ regularization (convex) with $\ell_0$ regularization (non-convex). $|\mathbf{x}|_1$ is the tightest convex relaxation of $|\mathbf{x}|_0$ on the unit ball, but they are not the same — $\ell_1$ achieves sparsity through geometry, not by counting nonzeros directly.

Next Steps

Next up: 14-08-constrained-optimization-theory.md — general constrained optimization, constraint qualifications, and the full KKT theory.

Progress

Phases

14-07 — Convex Optimization Problems

Learning Objectives

Core Content

Standard Form of a Convex Optimization Problem

The Hierarchy of Convex Problems

The Fundamental Theorem of Convex Optimization

KKT Conditions for Convex Problems

LP Example: Optimal Transport

SVM as a QP

Key Terms

Worked Examples

Example 1: Solving a Simple LP

Example 2: KKT for a QP

Example 3: Convex Reformulation

Quiz

Practice Problems

Summary

Pitfalls

Next Steps