Math graphic
📐 Concept diagram

14-08 — Constrained Optimization Theory

Phase: Optimization | Subject: 14-08 Prerequisites: 14-07-convex-optimization-problems.md, 06-08-lagrange-multipliers.md Next subject: 14-09-lagrangian-duality.md


Learning Objectives

By the end of this subject, you will be able to:

  1. Formulate the Lagrangian and explain the role of Lagrange multipliers as "prices" on constraints
  2. State the full KKT conditions for general (possibly non-convex) constrained optimization
  3. Describe constraint qualifications and why they're needed
  4. Interpret complementary slackness geometrically and economically
  5. Apply KKT to solve small constrained problems analytically

Core Content

The General Constrained Problem

$$\min_{\mathbf{x}} f(\mathbf{x}) \quad \text{s.t.} \quad g_i(\mathbf{x}) \leq 0,\; i=1,\ldots,m,\quad h_j(\mathbf{x}) = 0,\; j=1,\ldots,p$$

No convexity assumed. $f$, $g_i$, $h_j$ are differentiable (for KKT).

The Lagrangian

The central construction of constrained optimization:

$$\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}) = f(\mathbf{x}) + \sum_{i=1}^m \lambda_i g_i(\mathbf{x}) + \sum_{j=1}^p \nu_j h_j(\mathbf{x})$$

Interpretation: $\lambda_i$ is the "price" of violating constraint $i$. If you could relax constraint $i$ by a small amount $\epsilon$, the optimal value would improve by approximately $\lambda_i \epsilon$. This is the shadow price interpretation from economics.

⚠️ CRITICAL — The sign convention: $\lambda_i \geq 0$ for $g_i(\mathbf{x}) \leq 0$ (the standard form). If you write constraints as $g_i(\mathbf{x}) \geq 0$, the sign flips. Always convert to $\leq$ form before applying KKT.

The KKT Conditions (General)

For a local minimum $\mathbf{x}^$, under appropriate constraint qualifications, there exist $\boldsymbol{\lambda}^ \geq \mathbf{0}$ and $\boldsymbol{\nu}^*$ such that:

  1. Primal Feasibility: $$g_i(\mathbf{x}^) \leq 0,\quad h_j(\mathbf{x}^) = 0$$

  2. Dual Feasibility: $$\lambda_i^* \geq 0$$

  3. Stationarity: $$\nabla f(\mathbf{x}^) + \sum_{i=1}^m \lambda_i^ \nabla g_i(\mathbf{x}^) + \sum_{j=1}^p \nu_j^ \nabla h_j(\mathbf{x}^*) = \mathbf{0}$$

  4. Complementary Slackness: $$\lambda_i^ g_i(\mathbf{x}^) = 0 \quad \forall i$$

For NON-convex problems: KKT is necessary (under constraint qualifications) but NOT sufficient. A saddle point or local maximum can satisfy KKT.

For convex problems: KKT is necessary AND sufficient for global optimality.

Constraint Qualifications

KKT requires a "constraint qualification" (CQ) to hold at $\mathbf{x}^*$. CQs ensure the linearized feasible directions match the true feasible directions.

Slater's condition (for convex problems): There exists a strictly feasible point — $g_i(\mathbf{x}) < 0$ for all $i$. This is the simplest and most commonly satisfied CQ.

LICQ (Linear Independence Constraint Qualification): The gradients of active constraints are linearly independent. This is the standard CQ for general problems.

Why CQs matter: Without them, KKT can fail even at the optimum. Classic counterexample:

$$\min x \quad \text{s.t.} \quad x^2 \leq 0$$

The unique feasible point is $x=0$, so the optimum is $x^*=0$. But the gradient of the constraint at $x=0$ is $0$, so the stationarity condition reads $1 + \lambda \cdot 0 = 0$, which has no solution. LICQ fails (the active constraint gradient is zero), so KKT is inapplicable.



Key Terms

Worked Examples

Example 1: Full KKT Walkthrough

Solve $\min (x-3)^2 + (y-2)^2$ s.t. $x + y \leq 4$, $x \geq 0$, $y \geq 0$.

Solution: $f = (x-3)^2 + (y-2)^2$ $g_1 = x+y-4 \leq 0$, $g_2 = -x \leq 0$, $g_3 = -y \leq 0$

$\nabla f = (2(x-3), 2(y-2))$ $\nabla g_1 = (1,1)$, $\nabla g_2 = (-1,0)$, $\nabla g_3 = (0,-1)$

Stationarity: $2(x-3) + \lambda_1 - \lambda_2 = 0$, $2(y-2) + \lambda_1 - \lambda_3 = 0$

Complementary slackness: $\lambda_1(x+y-4)=0$, $\lambda_2(-x)=0$, $\lambda_3(-y)=0$

Unconstrained optimum: $(3,2)$ gives $x+y=5 > 4$ — infeasible. So constraint 1 must be active: $x+y=4$, $\lambda_1 \geq 0$.

Try $\lambda_2 = \lambda_3 = 0$ ($x,y > 0$): $2(x-3) + \lambda_1 = 0$, $2(y-2) + \lambda_1 = 0$ → $x-3 = y-2$ → $x = y+1$. With $x+y=4$: $(y+1)+y=4$ → $y=1.5$, $x=2.5$. Both positive ✓. $\lambda_1 = -2(x-3) = -2(2.5-3) = 1 \geq 0$ ✓.

Optimum: $x^=2.5$, $y^=1.5$, $f=(2.5-3)^2+(1.5-2)^2 = 0.25+0.25=0.5$, $\lambda_1^*=1$.

Click for answer $x^* = (2.5, 1.5)$, $f^* = 0.5$, $\lambda_1^* = 1$. The constraint $x+y \leq 4$ is active; $x \geq 0$ and $y \geq 0$ are inactive.

Example 2: Shadow Price Interpretation

In Example 1, $\lambda_1^* = 1$. If we relax the constraint to $x+y \leq 5$, what's the new optimum (approximately)?

Solution: The unconstrained optimum $(3,2)$ has $x+y=5$, so relaxing to $x+y \leq 5$ makes the constraint inactive — the optimum becomes $(3,2)$, $f=0$.

The change in optimal value is $0 - 0.5 = -0.5$ (improvement of 0.5). The shadow price predicts: $\lambda_1^* \times \Delta b_1 = 1 \times (5-4) = 1$ — but we got 0.5 improvement.

The discrepancy: the shadow price is a local linear approximation ($\partial f^/\partial b_1 = \lambda_1^$). For a change from $b=4$ to $b=5$, the constraint becomes entirely inactive at $b=5$ (where $x^+y^ = 5$ for the unconstrained opt), so the approximation breaks down for such a large change. For a small change from $b=4$ to $b=4.1$, the shadow price predicts $\Delta f \approx -0.1$, which is accurate.

Click for answer The shadow price $\lambda_1^* = 1$ is a derivative: $\partial f^*/\partial b$. It's accurate for small changes but can break for large ones where the active set changes.

Example 3: KKT Without CQ

$\min x$ s.t. $x^3 \leq 0$, $x \geq 0$. Find the optimum. Does KKT work?

Solution: Feasible set: $x \in [0,0]$ → only $x=0$. So $x^*=0$ (trivially optimal).

$g_1 = x^3 \leq 0$, $g_2 = -x \leq 0$. $\nabla g_1(0) = 0$, violates LICQ.

KKT stationarity: $1 + \lambda_1(0) + \lambda_2(-1) = 0 \implies 1 - \lambda_2 = 0 \implies \lambda_2 = 1 \geq 0$ ✓. Complementary slackness: $\lambda_1(0) = 0$ ✓, $\lambda_2(0) = 0$ ✓.

KKT does work here: $\lambda_1$ can be arbitrary (its gradient is zero, so it doesn't affect stationarity), and $\lambda_2=1$. LICQ failure doesn't always break KKT — it just means KKT might fail.

The classic counterexample where KKT fails: $\min x$ s.t. $x^2 \leq 0$. Here $\nabla g(0) = 0$, and stationarity: $1 + \lambda(0) = 0$ — no solution. KKT genuinely fails.

Click for answer $x^* = 0$ is trivially optimal. KKT can find it ($\lambda_2=1$, $\lambda_1$ arbitrary) because the other constraint provides a nonzero gradient. KKT only fails when ALL active constraint gradients are linearly dependent in a way that excludes the objective gradient.


Quiz

Q1: What does the concept of Lagrange multipliers primarily refer to in this subject?

A) A historical anecdote about Lagrange multipliers B) A computational error related to Lagrange multipliers C) The definition and application of Lagrange multipliers D) A visual representation of Lagrange multipliers

Correct: C)

Q2: Which of the following is the key formula discussed in this subject?

A) A simplified version of \min_{\mathbf{x}} f(\mathbf... B) \min_{\mathbf{x}} f(\mathbf{x}) \quad \text{s.t.} \quad g_i(\mathbf{x}) \leq 0,\; i=1,\ldots,m,\quad h_j(\mathbf{x}) = 0,\; j=1,\ldots,p C) An unrelated formula from a different topic D) The inverse operation of the formula in question

Correct: B)

Q3: What is the primary purpose of The General Constrained Problem?

A) It is used to the general constrained problem in mathematical analysis B) It is used only in advanced research contexts C) It is primarily a historical notation system D) It replaces all other methods in this domain

Correct: A)

Q4: Which statement about The Lagrangian is TRUE?

A) The Lagrangian is not related to this subject B) The Lagrangian is a fundamental concept covered in this subject C) The Lagrangian is mentioned only as a historical footnote D) The Lagrangian is an advanced topic beyond this subject's scope

Correct: B)

Q5: Based on the worked examples in this subject, what is the correct result?

A) Shadow Price Interpretation B) An unrelated numerical value C) A different result from a common mistake D) The inverse of the correct answer

Correct: A)

Q6: How are The Lagrangian and The Kkt Conditions (General) related?

A) The Lagrangian is a special case of The Kkt Conditions (General) B) The Lagrangian and The Kkt Conditions (General) are closely related concepts C) The Lagrangian is the inverse of The Kkt Conditions (General) D) The Lagrangian and The Kkt Conditions (General) are completely unrelated topics

Correct: B)

Q7: What is a common pitfall when working with Constraint Qualifications?

A) A common mistake is confusing Constraint Qualifications with a similar concept B) Constraint Qualifications has no common misconceptions C) The main error with Constraint Qualifications is using it when it is not needed D) Constraint Qualifications is always computed the same way in all contexts

Correct: A)

Q8: When should you apply Example 1: Full Kkt Walkthrough?

A) Use Example 1: Full Kkt Walkthrough only in pure mathematics contexts B) Avoid Example 1: Full Kkt Walkthrough unless explicitly instructed C) Example 1: Full Kkt Walkthrough is not practically useful D) Apply Example 1: Full Kkt Walkthrough to solve problems in this subject's domain

Correct: D)

Practice Problems

  1. Solve $\min x^2 + y^2$ s.t. $x + 2y = 3$ using KKT.

    Click for answer Equality constraint only: $h = x+2y-3 = 0$. Stationarity: $2x + \nu = 0$, $2y + 2\nu = 0$ → $x = -\nu/2$, $y = -\nu$. Feasibility: $-\nu/2 + 2(-\nu) = 3$ → $-\nu/2 - 2\nu = 3$ → $-5\nu/2 = 3$ → $\nu = -6/5$. $x = 3/5$, $y = 6/5$, $f = (3/5)^2 + (6/5)^2 = 9/25 + 36/25 = 45/25 = 9/5$.

  2. Apply KKT to the problem $\min -x$ s.t. $x \leq 1$, $x \geq 0$ and identify the active constraints.

    Click for answer $g_1 = x-1 \leq 0$, $g_2 = -x \leq 0$. Stationarity: $-1 + \lambda_1 - \lambda_2 = 0$ → $\lambda_1 - \lambda_2 = 1$. CS: $\lambda_1(x-1)=0$, $\lambda_2(-x)=0$. Try $x=1$: $\lambda_2=0$ → $\lambda_1=1 \geq 0$ ✓. $x^*=1$, $f^*=-1$. Try $x=0$: $\lambda_1=0$ → $-\lambda_2=1$ → $\lambda_2=-1$ ✗ (dual feasibility violated). So $x^*=1$, only $g_1$ active.

  3. Explain why the LICQ fails at the optimum of $\min x_1$ s.t. $x_1^2 + x_2^2 \leq 0$, $x_1 \geq 0$. Does KKT still find the optimum?

    Click for answer Feasible set: only $(0,0)$ (the first constraint forces $x_1=x_2=0$). Active constraints at $(0,0)$: $g_1 = x_1^2+x_2^2 \leq 0$ (active), $g_2 = -x_1 \leq 0$ (active). $\nabla g_1(0,0) = (0,0)$, $\nabla g_2 = (-1,0)$. Only one nonzero gradient among two active constraints — LICQ fails. KKT stationarity: $(1,0) + \lambda_1(0,0) + \lambda_2(-1,0) = 0$ → $1-\lambda_2=0$, $0=0$ → $\lambda_2=1$, $\lambda_1$ arbitrary. KKT works here because the other constraint provides enough gradient information.

  4. A linear program has no strictly feasible point (all constraints are equalities or active). Does Slater's condition hold?

    Click for answer No — Slater's condition requires strict feasibility: there exists $\mathbf{x}$ with $g_i(\mathbf{x}) < 0$ for all inequality constraints. If all inequalities are tight at every feasible point, Slater fails. For LPs, Slater failure means strong duality might not hold (though for LPs specifically, weaker conditions suffice). In general convex optimization, Slater failure can mean a nonzero duality gap.

  5. Interpret $\lambda_i^* = 0$ in economic terms.

    Click for answer $\lambda_i^* = 0$ means the constraint is "not binding" — you have slack. The shadow price is zero: slightly tightening or relaxing the constraint would not change the optimal value. The resource constrained by $g_i$ is not scarce at the optimum. In ML: if a regularization parameter's KKT multiplier is zero, the corresponding regularizer is not actively constraining the solution — you could remove or reduce it without changing the optimum.


Summary

Key takeaways:


Pitfalls

  1. Forgetting to check constraint qualifications. KKT conditions can fail to hold at the optimum when LICQ is violated — e.g., $\min x$ s.t. $x^2 \leq 0$ has optimum $x^*=0$ but no KKT multipliers exist because the constraint gradient vanishes. Always verify a CQ holds before applying KKT.

  2. Getting the sign convention for inequality constraints wrong. The standard form is $g_i(\mathbf{x}) \leq 0$ with $\lambda_i \geq 0$. If your constraint is $h(\mathbf{x}) \geq 0$, rewrite as $-h(\mathbf{x}) \leq 0$ before applying KKT. Mixing up the sign flips the direction of the penalty.

  3. Treating KKT as sufficient for non-convex problems. On $\min -x^2$ s.t. $x \in [-1, 1]$, the KKT point $x=0$ (with $\lambda_1=\lambda_2=0$) is a global maximum, not a minimum. KKT is only necessary for general problems — you need convexity for sufficiency.

  4. Misinterpreting shadow prices for large constraint changes. Shadow price $\lambda_i^$ is a local derivative: $\partial f^/\partial b_i$. Using it to predict the effect of large constraint relaxations can be wildly inaccurate when the active set changes.

  5. Neglecting complementary slackness when solving analytically. The condition $\lambda_i^ g_i(\mathbf{x}^) = 0$ is the most useful KKT condition in practice — it dramatically reduces the casework by telling you which constraints matter. Always check both cases ($\lambda_i = 0$ and $g_i = 0$) systematically.



Next Steps

Next up: 14-09-lagrangian-duality.md — the dual problem, weak and strong duality, and how duality provides certificates of optimality.