Math graphic
📐 Concept diagram

14-07 — Convex Optimization Problems

Phase: Optimization | Subject: 14-07 Prerequisites: 14-06-convex-sets-functions.md Next subject: 14-08-constrained-optimization-theory.md


Learning Objectives

By the end of this subject, you will be able to:

  1. Formulate an optimization problem in standard convex form
  2. Classify problems as LP, QP, QCQP, SOCP, or SDP
  3. Derive and apply the KKT conditions for convex problems
  4. Explain why every local minimum of a convex problem is global
  5. Recognize when a non-convex problem can be reformulated as convex

Core Content

Standard Form of a Convex Optimization Problem

$$\min_{\mathbf{x}} f_0(\mathbf{x}) \quad \text{s.t.} \quad f_i(\mathbf{x}) \leq 0,\; i = 1,\ldots,m,\quad A\mathbf{x} = \mathbf{b}$$

where $f_0, f_1, \ldots, f_m$ are convex functions. The equality constraints must be affine ($A\mathbf{x} = \mathbf{b}$) — not arbitrary equalities (a convex equality $h(\mathbf{x})=0$ with $h$ convex would require both $h(\mathbf{x}) \leq 0$ and $-h(\mathbf{x}) \leq 0$, forcing $h$ to be affine).

⚠️ CRITICAL: The equality constraints in a convex problem are ALWAYS affine. An equality constraint like $|\mathbf{x}|^2 = 1$ is NOT convex because the set ${\mathbf{x}: |\mathbf{x}|^2 = 1}$ (sphere surface) is non-convex.

The Hierarchy of Convex Problems

Problem Class Form ML Example
LP (Linear Program) $\min \mathbf{c}^T\mathbf{x}$ s.t. $A\mathbf{x} \leq \mathbf{b}$ Resource allocation, optimal transport
QP (Quadratic Program) $\min \frac{1}{2}\mathbf{x}^TQ\mathbf{x} + \mathbf{c}^T\mathbf{x}$ s.t. $A\mathbf{x} \leq \mathbf{b}$, $Q \succeq 0$ SVM primal, L2-regularized linear regression, Markowitz portfolio
QCQP QP with quadratic constraints ---
SOCP (Second-Order Cone) $\min \mathbf{c}^T\mathbf{x}$ s.t. $|A_i\mathbf{x}+\mathbf{b}_i| \leq \mathbf{c}_i^T\mathbf{x} + d_i$ Robust linear regression
SDP (Semidefinite Program) $\min \operatorname{tr}(CX)$ s.t. $\operatorname{tr}(A_iX) = b_i$, $X \succeq 0$ Matrix completion, Max-Cut relaxation

Each successive class in the hierarchy is more expressive than the previous, but also more expensive to solve.

The Fundamental Theorem of Convex Optimization

Any local minimum of a convex optimization problem is a global minimum.

Proof: Suppose $\mathbf{x}^$ is a local minimum but not global. Then there exists feasible $\mathbf{y}$ with $f_0(\mathbf{y}) < f_0(\mathbf{x}^)$. By convexity of the feasible set, $\mathbf{z}_\lambda = \lambda\mathbf{y} + (1-\lambda)\mathbf{x}^*$ is feasible for all $\lambda \in [0,1]$. By convexity of $f_0$:

$f_0(\mathbf{z}_\lambda) \leq \lambda f_0(\mathbf{y}) + (1-\lambda)f_0(\mathbf{x}^) < f_0(\mathbf{x}^)$

For sufficiently small $\lambda$, $\mathbf{z}_\lambda$ is arbitrarily close to $\mathbf{x}^*$ (contradicting local optimality). ∎

KKT Conditions for Convex Problems

For a convex problem with differentiable $f_i$, a point $\mathbf{x}^$ is globally optimal if and only if there exist $\boldsymbol{\lambda}^ \geq \mathbf{0}$ and $\boldsymbol{\nu}^*$ satisfying:

  1. Primal feasibility: $f_i(\mathbf{x}^) \leq 0$, $A\mathbf{x}^ = \mathbf{b}$
  2. Dual feasibility: $\lambda_i^* \geq 0$
  3. Stationarity: $\nabla f_0(\mathbf{x}^) + \sum_i \lambda_i^ \nabla f_i(\mathbf{x}^) + A^T\boldsymbol{\nu}^ = \mathbf{0}$
  4. Complementary slackness: $\lambda_i^ f_i(\mathbf{x}^) = 0$ for all $i$

⚠️ CRITICAL — For convex problems, KKT is necessary AND sufficient. For non-convex problems, KKT is only necessary — a point can satisfy KKT and still not be optimal (it could be a saddle or local max).

Complementary slackness is the most important condition: for each inequality constraint, either $\lambda_i^ = 0$ (constraint inactive) or $f_i(\mathbf{x}^) = 0$ (constraint active/binding).

LP Example: Optimal Transport

The Kantorovich optimal transport problem (used in Wasserstein distances for ML):

$$\min_{P \in \mathbb{R}^{m \times n}} \sum_{i,j} P_{ij} C_{ij} \quad \text{s.t.} \quad P\mathbf{1} = \mathbf{a},\; P^T\mathbf{1} = \mathbf{b},\; P \geq 0$$

This is an LP with $mn$ variables, $m+n$ equality constraints. It's convex and solvable in polynomial time (though Sinkhorn regularization is preferred for large-scale ML).

SVM as a QP

The hard-margin SVM primal:

$$\min_{\mathbf{w}, b} \frac{1}{2}|\mathbf{w}|^2 \quad \text{s.t.} \quad y_i(\mathbf{w}^T\mathbf{x}_i + b) \geq 1,\; i=1,\ldots,N$$

This is a QP: quadratic objective ($Q = I \succeq 0$) with linear inequality constraints. Convex → global optimum, solved efficiently by specialized algorithms (SMO) or general QP solvers.



Key Terms

Worked Examples

Example 1: Solving a Simple LP

Solve: $\min_{x,y} -x - y$ s.t. $x + y \leq 2$, $x \geq 0$, $y \geq 0$.

Solution (geometric): The LP solution: At $(2,0)$: $f=-2$. At $(0,2)$: $f=-2$. At $(1,1)$: $f=-2$. The entire segment $x+y=2$ with $x,y \geq 0$ gives the same optimal value of $-2$. Any point on this edge is optimal.

This illustrates that LP solutions can be non-unique — any convex combination of optimal vertices is also optimal.

Click for answer Optimal value: $-2$. All points on the edge $x+y=2$, $x,y \geq 0$ are optimal (infinitely many solutions).

Example 2: KKT for a QP

Consider $\min x^2$ s.t. $x \geq 1$. Apply KKT.

Solution: $f_0(x) = x^2$, $f_1(x) = 1 - x \leq 0$ (i.e., $x \geq 1$). $\nabla f_0 = 2x$, $\nabla f_1 = -1$.

KKT stationarity: $2x + \lambda(-1) = 0 \implies 2x = \lambda$. Complementary slackness: $\lambda(1-x) = 0$. Dual feasibility: $\lambda \geq 0$.

Case 1: $\lambda = 0 \implies 2x = 0 \implies x = 0$. But $1-0 \leq 0$ → $1 \leq 0$ ✗ (primal infeasible). Case 2: $1-x = 0 \implies x = 1$, then $\lambda = 2x = 2 \geq 0$ ✓.

Solution: $x^ = 1$, $\lambda^ = 2$, $f_0(x^*) = 1$.

This makes sense: the unconstrained minimum of $x^2$ is at $x=0$, but the constraint $x \geq 1$ pushes it to the boundary.

Click for answer $x^* = 1$, $\lambda^* = 2$, $f(x^*) = 1$. The constraint is active — the optimum lies on the boundary.

Example 3: Convex Reformulation

The problem $\min |A\mathbf{x} - \mathbf{b}|_2$ (unconstrained) is convex. But what about $\min |A\mathbf{x} - \mathbf{b}|_2 + \lambda|\mathbf{x}|_0$?

The $\ell_0$ norm (count of nonzeros) makes this non-convex. But the convex relaxation replaces $|\mathbf{x}|_0$ with $|\mathbf{x}|_1$:

$$\min |A\mathbf{x} - \mathbf{b}|_2^2 + \lambda|\mathbf{x}|_1$$

This is the LASSO — a convex problem (sum of convex functions) that under certain conditions recovers the true sparse solution. This is the foundational idea behind compressed sensing and sparse recovery.

Click for answer Replace $\|\mathbf{x}\|_0$ with $\|\mathbf{x}\|_1$ — the tightest convex relaxation of the cardinality function. This yields the LASSO, a convex QP variant.


Quiz

Q1: What does the concept of Standard Form Of A Convex Optimization Problem primarily refer to in this subject?

A) The definition and application of Standard Form Of A Convex Optimization Problem B) A visual representation of Standard Form Of A Convex Optimization Problem C) A computational error related to Standard Form Of A Convex Optimization Problem D) A historical anecdote about Standard Form Of A Convex Optimization Problem

Correct: A)

Q2: Which of the following is the key formula discussed in this subject?

A) An unrelated formula from a different topic B) The inverse operation of the formula in question C) \min_{\mathbf{x}} f_0(\mathbf{x}) \quad \text{s.t.} \quad f_i(\mathbf{x}) \leq 0,\; i = 1,\ldots,m,\quad A\mathbf{x} = \mathbf{b} D) A simplified version of \min_{\mathbf{x}} f_0(\math...

Correct: C)

Q3: What is the primary purpose of The Hierarchy Of Convex Problems?

A) It replaces all other methods in this domain B) It is used to the hierarchy of convex problems in mathematical analysis C) It is primarily a historical notation system D) It is used only in advanced research contexts

Correct: B)

Q4: Which statement about The Fundamental Theorem Of Convex Optimization is TRUE?

A) The Fundamental Theorem Of Convex Optimization is not related to this subject B) The Fundamental Theorem Of Convex Optimization is mentioned only as a historical footnote C) The Fundamental Theorem Of Convex Optimization is a fundamental concept covered in this subject D) The Fundamental Theorem Of Convex Optimization is an advanced topic beyond this subject's scope

Correct: C)

Q5: Based on the worked examples in this subject, what is the correct result?

A) 2$, $x,y \geq 0$ are optimal (infinitely many B) The inverse of the correct answer C) An unrelated numerical value D) A different result from a common mistake

Correct: A)

Q6: How are The Fundamental Theorem Of Convex Optimization and Kkt Conditions For Convex Problems related?

A) The Fundamental Theorem Of Convex Optimization and Kkt Conditions For Convex Problems are completely unrelated topics B) The Fundamental Theorem Of Convex Optimization and Kkt Conditions For Convex Problems are closely related concepts C) The Fundamental Theorem Of Convex Optimization is a special case of Kkt Conditions For Convex Problems D) The Fundamental Theorem Of Convex Optimization is the inverse of Kkt Conditions For Convex Problems

Correct: B)

Q7: What is a common pitfall when working with Lp Example: Optimal Transport?

A) The main error with Lp Example: Optimal Transport is using it when it is not needed B) Lp Example: Optimal Transport has no common misconceptions C) A common mistake is confusing Lp Example: Optimal Transport with a similar concept D) Lp Example: Optimal Transport is always computed the same way in all contexts

Correct: C)

Q8: When should you apply Svm As A Qp?

A) Avoid Svm As A Qp unless explicitly instructed B) Apply Svm As A Qp to solve problems in this subject's domain C) Svm As A Qp is not practically useful D) Use Svm As A Qp only in pure mathematics contexts

Correct: B)

Practice Problems

  1. Formulate the problem of finding the point in a polyhedron closest to a given point $\mathbf{y}$ as a QP.

    Click for answer $\min_{\mathbf{x}} \|\mathbf{x} - \mathbf{y}\|_2^2$ s.t. $A\mathbf{x} \leq \mathbf{b}$. Expand: $\min \mathbf{x}^T\mathbf{x} - 2\mathbf{y}^T\mathbf{x} + \mathbf{y}^T\mathbf{y}$. The constant $\mathbf{y}^T\mathbf{y}$ doesn't affect the minimizer. This is a QP with $Q = 2I \succ 0$ (strictly convex).

  2. Apply KKT to $\min (x-2)^2 + (y-1)^2$ s.t. $x + y \leq 1$.

    Click for answer $f_0 = (x-2)^2 + (y-1)^2$, constraint $x+y-1 \leq 0$. $\nabla f_0 = (2(x-2), 2(y-1))$. Stationarity: $2(x-2) + \lambda = 0$, $2(y-1) + \lambda = 0$. So $2(x-2) = 2(y-1) \implies x-2 = y-1 \implies x = y+1$. Complementary slackness: $\lambda(x+y-1) = 0$. If $\lambda = 0$: $x=2$, $y=1$. Check $2+1-1 = 2 > 0$ → infeasible. If $\lambda \neq 0$: $x+y=1$. With $x=y+1$: $(y+1)+y=1 \implies 2y=0 \implies y=0$, $x=1$. $\lambda = -2(x-2) = -2(1-2) = 2 \geq 0$ ✓. $x^* = 1$, $y^* = 0$, $f = (1-2)^2 + (0-1)^2 = 2$.

  3. Is $\min |A\mathbf{x} - \mathbf{b}|_2^2$ s.t. $|\mathbf{x}|_2 \leq 1$ convex? If so, what class?

    Click for answer Yes — minimize a convex quadratic subject to a convex quadratic constraint $\|\mathbf{x}\|_2^2 - 1 \leq 0$ (the Hessian of the constraint function is $2I \succeq 0$). This is a QCQP (Quadratically Constrained Quadratic Program). This specific problem is also known as ridge regression with a constraint (as opposed to a penalty) — also called "trust-region subproblem."

  4. Explain why $\min e^x$ is convex but $\min e^x$ s.t. $x^2 = 1$ is not a convex problem.

    Click for answer $e^x$ is convex. The constraint $x^2 = 1$ defines the set $\{-1, 1\}$, which is not convex (and is not affine). Equality constraints in convex problems must be affine ($A\mathbf{x} = \mathbf{b}$). The constraint $x^2 = 1$ is equivalent to $x^2 - 1 \leq 0$ AND $1 - x^2 \leq 0$, but the second constraint has $f_2(x) = 1-x^2$ which is NOT convex (concave). So this cannot be written as a convex problem.

  5. For the LP $\min \mathbf{c}^T\mathbf{x}$ s.t. $A\mathbf{x} \leq \mathbf{b}$, $\mathbf{x} \geq 0$, write the KKT conditions and interpret complementary slackness.

    Click for answer With $\boldsymbol{\lambda} \geq 0$ for $A\mathbf{x} \leq \mathbf{b}$ and $\boldsymbol{\mu} \geq 0$ for $-\mathbf{x} \leq 0$: Stationarity: $\mathbf{c} + A^T\boldsymbol{\lambda} - \boldsymbol{\mu} = 0 \implies \boldsymbol{\mu} = \mathbf{c} + A^T\boldsymbol{\lambda}$. Complementary slackness: $\lambda_i(\mathbf{a}_i^T\mathbf{x} - b_i) = 0$, $\mu_i x_i = 0$. Interpretation: If $\mu_i > 0$, then $x_i = 0$ (variable at bound). If $x_i > 0$, then $\mu_i = 0$ → $(c + A^T\lambda)_i = 0$ → reduced cost is zero.


Summary

Key takeaways:


Pitfalls

  1. Writing non-affine equality constraints in a convex problem. $x^2 = 1$ defines the non-convex set ${-1, 1}$ and is equivalent to $x^2 \leq 1$ AND $-x^2 \leq 0$, but $-x^2$ is concave — not allowed. Equality constraints in convex optimization must be affine ($A\mathbf{x} = \mathbf{b}$).

  2. Assuming KKT conditions are sufficient for non-convex problems. KKT is only necessary for general problems (under constraint qualifications). A saddle point or local maximum can satisfy KKT. For convex problems only, KKT is both necessary and sufficient.

  3. Misclassifying a problem as convex. Having a convex objective is not enough — inequality constraint functions must also be convex, and equality constraints must be affine. A single non-convex constraint makes the whole problem non-convex.

  4. Forgetting that LP solutions can be non-unique. When the objective is parallel to a constraint boundary, an entire edge or face may be optimal. Don't assume uniqueness unless the problem is strictly convex.

  5. Confusing $\ell_1$ regularization (convex) with $\ell_0$ regularization (non-convex). $|\mathbf{x}|_1$ is the tightest convex relaxation of $|\mathbf{x}|_0$ on the unit ball, but they are not the same — $\ell_1$ achieves sparsity through geometry, not by counting nonzeros directly.



Next Steps

Next up: 14-08-constrained-optimization-theory.md — general constrained optimization, constraint qualifications, and the full KKT theory.