14-08 — Constrained Optimization Theory
Phase: Optimization | Subject: 14-08 Prerequisites: 14-07-convex-optimization-problems.md, 06-08-lagrange-multipliers.md Next subject: 14-09-lagrangian-duality.md
Learning Objectives
By the end of this subject, you will be able to:
- Formulate the Lagrangian and explain the role of Lagrange multipliers as "prices" on constraints
- State the full KKT conditions for general (possibly non-convex) constrained optimization
- Describe constraint qualifications and why they're needed
- Interpret complementary slackness geometrically and economically
- Apply KKT to solve small constrained problems analytically
Core Content
The General Constrained Problem
$$\min_{\mathbf{x}} f(\mathbf{x}) \quad \text{s.t.} \quad g_i(\mathbf{x}) \leq 0,\; i=1,\ldots,m,\quad h_j(\mathbf{x}) = 0,\; j=1,\ldots,p$$
No convexity assumed. $f$, $g_i$, $h_j$ are differentiable (for KKT).
The Lagrangian
The central construction of constrained optimization:
$$\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}) = f(\mathbf{x}) + \sum_{i=1}^m \lambda_i g_i(\mathbf{x}) + \sum_{j=1}^p \nu_j h_j(\mathbf{x})$$
- $\lambda_i \geq 0$ — Lagrange multipliers for inequality constraints
- $\nu_j \in \mathbb{R}$ — Lagrange multipliers for equality constraints (unrestricted sign)
Interpretation: $\lambda_i$ is the "price" of violating constraint $i$. If you could relax constraint $i$ by a small amount $\epsilon$, the optimal value would improve by approximately $\lambda_i \epsilon$. This is the shadow price interpretation from economics.
⚠️ CRITICAL — The sign convention: $\lambda_i \geq 0$ for $g_i(\mathbf{x}) \leq 0$ (the standard form). If you write constraints as $g_i(\mathbf{x}) \geq 0$, the sign flips. Always convert to $\leq$ form before applying KKT.
The KKT Conditions (General)
For a local minimum $\mathbf{x}^$, under appropriate constraint qualifications, there exist $\boldsymbol{\lambda}^ \geq \mathbf{0}$ and $\boldsymbol{\nu}^*$ such that:
-
Primal Feasibility: $$g_i(\mathbf{x}^) \leq 0,\quad h_j(\mathbf{x}^) = 0$$
-
Dual Feasibility: $$\lambda_i^* \geq 0$$
-
Stationarity: $$\nabla f(\mathbf{x}^) + \sum_{i=1}^m \lambda_i^ \nabla g_i(\mathbf{x}^) + \sum_{j=1}^p \nu_j^ \nabla h_j(\mathbf{x}^*) = \mathbf{0}$$
-
Complementary Slackness: $$\lambda_i^ g_i(\mathbf{x}^) = 0 \quad \forall i$$
For NON-convex problems: KKT is necessary (under constraint qualifications) but NOT sufficient. A saddle point or local maximum can satisfy KKT.
For convex problems: KKT is necessary AND sufficient for global optimality.
Constraint Qualifications
KKT requires a "constraint qualification" (CQ) to hold at $\mathbf{x}^*$. CQs ensure the linearized feasible directions match the true feasible directions.
Slater's condition (for convex problems): There exists a strictly feasible point — $g_i(\mathbf{x}) < 0$ for all $i$. This is the simplest and most commonly satisfied CQ.
LICQ (Linear Independence Constraint Qualification): The gradients of active constraints are linearly independent. This is the standard CQ for general problems.
Why CQs matter: Without them, KKT can fail even at the optimum. Classic counterexample:
$$\min x \quad \text{s.t.} \quad x^2 \leq 0$$
The unique feasible point is $x=0$, so the optimum is $x^*=0$. But the gradient of the constraint at $x=0$ is $0$, so the stationarity condition reads $1 + \lambda \cdot 0 = 0$, which has no solution. LICQ fails (the active constraint gradient is zero), so KKT is inapplicable.
Key Terms
- LICQ
- Lagrange multipliers
Worked Examples
Example 1: Full KKT Walkthrough
Solve $\min (x-3)^2 + (y-2)^2$ s.t. $x + y \leq 4$, $x \geq 0$, $y \geq 0$.
Solution: $f = (x-3)^2 + (y-2)^2$ $g_1 = x+y-4 \leq 0$, $g_2 = -x \leq 0$, $g_3 = -y \leq 0$
$\nabla f = (2(x-3), 2(y-2))$ $\nabla g_1 = (1,1)$, $\nabla g_2 = (-1,0)$, $\nabla g_3 = (0,-1)$
Stationarity: $2(x-3) + \lambda_1 - \lambda_2 = 0$, $2(y-2) + \lambda_1 - \lambda_3 = 0$
Complementary slackness: $\lambda_1(x+y-4)=0$, $\lambda_2(-x)=0$, $\lambda_3(-y)=0$
Unconstrained optimum: $(3,2)$ gives $x+y=5 > 4$ — infeasible. So constraint 1 must be active: $x+y=4$, $\lambda_1 \geq 0$.
Try $\lambda_2 = \lambda_3 = 0$ ($x,y > 0$): $2(x-3) + \lambda_1 = 0$, $2(y-2) + \lambda_1 = 0$ → $x-3 = y-2$ → $x = y+1$. With $x+y=4$: $(y+1)+y=4$ → $y=1.5$, $x=2.5$. Both positive ✓. $\lambda_1 = -2(x-3) = -2(2.5-3) = 1 \geq 0$ ✓.
Optimum: $x^=2.5$, $y^=1.5$, $f=(2.5-3)^2+(1.5-2)^2 = 0.25+0.25=0.5$, $\lambda_1^*=1$.
Click for answer
$x^* = (2.5, 1.5)$, $f^* = 0.5$, $\lambda_1^* = 1$. The constraint $x+y \leq 4$ is active; $x \geq 0$ and $y \geq 0$ are inactive.Example 2: Shadow Price Interpretation
In Example 1, $\lambda_1^* = 1$. If we relax the constraint to $x+y \leq 5$, what's the new optimum (approximately)?
Solution: The unconstrained optimum $(3,2)$ has $x+y=5$, so relaxing to $x+y \leq 5$ makes the constraint inactive — the optimum becomes $(3,2)$, $f=0$.
The change in optimal value is $0 - 0.5 = -0.5$ (improvement of 0.5). The shadow price predicts: $\lambda_1^* \times \Delta b_1 = 1 \times (5-4) = 1$ — but we got 0.5 improvement.
The discrepancy: the shadow price is a local linear approximation ($\partial f^/\partial b_1 = \lambda_1^$). For a change from $b=4$ to $b=5$, the constraint becomes entirely inactive at $b=5$ (where $x^+y^ = 5$ for the unconstrained opt), so the approximation breaks down for such a large change. For a small change from $b=4$ to $b=4.1$, the shadow price predicts $\Delta f \approx -0.1$, which is accurate.
Click for answer
The shadow price $\lambda_1^* = 1$ is a derivative: $\partial f^*/\partial b$. It's accurate for small changes but can break for large ones where the active set changes.Example 3: KKT Without CQ
$\min x$ s.t. $x^3 \leq 0$, $x \geq 0$. Find the optimum. Does KKT work?
Solution: Feasible set: $x \in [0,0]$ → only $x=0$. So $x^*=0$ (trivially optimal).
$g_1 = x^3 \leq 0$, $g_2 = -x \leq 0$. $\nabla g_1(0) = 0$, violates LICQ.
KKT stationarity: $1 + \lambda_1(0) + \lambda_2(-1) = 0 \implies 1 - \lambda_2 = 0 \implies \lambda_2 = 1 \geq 0$ ✓. Complementary slackness: $\lambda_1(0) = 0$ ✓, $\lambda_2(0) = 0$ ✓.
KKT does work here: $\lambda_1$ can be arbitrary (its gradient is zero, so it doesn't affect stationarity), and $\lambda_2=1$. LICQ failure doesn't always break KKT — it just means KKT might fail.
The classic counterexample where KKT fails: $\min x$ s.t. $x^2 \leq 0$. Here $\nabla g(0) = 0$, and stationarity: $1 + \lambda(0) = 0$ — no solution. KKT genuinely fails.
Click for answer
$x^* = 0$ is trivially optimal. KKT can find it ($\lambda_2=1$, $\lambda_1$ arbitrary) because the other constraint provides a nonzero gradient. KKT only fails when ALL active constraint gradients are linearly dependent in a way that excludes the objective gradient.Quiz
Q1: What does the concept of Lagrange multipliers primarily refer to in this subject?
A) A historical anecdote about Lagrange multipliers B) A computational error related to Lagrange multipliers C) The definition and application of Lagrange multipliers D) A visual representation of Lagrange multipliers
Correct: C)
- If you chose A: This is incorrect. Lagrange multipliers is defined as: the definition and application of lagrange multipliers. The other options describe different aspects that are not the primary focus.
- If you chose B: This is incorrect. Lagrange multipliers is defined as: the definition and application of lagrange multipliers. The other options describe different aspects that are not the primary focus.
- If you chose C: Lagrange multipliers is defined as: the definition and application of lagrange multipliers. The other options describe different aspects that are not the primary focus. Correct!
- If you chose D: This is incorrect. Lagrange multipliers is defined as: the definition and application of lagrange multipliers. The other options describe different aspects that are not the primary focus.
Q2: Which of the following is the key formula discussed in this subject?
A) A simplified version of \min_{\mathbf{x}} f(\mathbf... B) \min_{\mathbf{x}} f(\mathbf{x}) \quad \text{s.t.} \quad g_i(\mathbf{x}) \leq 0,\; i=1,\ldots,m,\quad h_j(\mathbf{x}) = 0,\; j=1,\ldots,p C) An unrelated formula from a different topic D) The inverse operation of the formula in question
Correct: B)
- If you chose A: This is incorrect. The formula \min_{\mathbf{x}} f(\mathbf{x}) \quad \text{s.t.} \quad g_i(\mathbf{x}) \leq 0,\; i=1,\ldots,m,\quad h_j(\mathbf{x}) = 0,\; j=1,\ldots,p is central to this subject. The other options are either simplified versions or unrelated.
- If you chose B: The formula \min_{\mathbf{x}} f(\mathbf{x}) \quad \text{s.t.} \quad g_i(\mathbf{x}) \leq 0,\; i=1,\ldots,m,\quad h_j(\mathbf{x}) = 0,\; j=1,\ldots,p is central to this subject. The other options are either simplified versions or unrelated. Correct!
- If you chose C: This is incorrect. The formula \min_{\mathbf{x}} f(\mathbf{x}) \quad \text{s.t.} \quad g_i(\mathbf{x}) \leq 0,\; i=1,\ldots,m,\quad h_j(\mathbf{x}) = 0,\; j=1,\ldots,p is central to this subject. The other options are either simplified versions or unrelated.
- If you chose D: This is incorrect. The formula \min_{\mathbf{x}} f(\mathbf{x}) \quad \text{s.t.} \quad g_i(\mathbf{x}) \leq 0,\; i=1,\ldots,m,\quad h_j(\mathbf{x}) = 0,\; j=1,\ldots,p is central to this subject. The other options are either simplified versions or unrelated.
Q3: What is the primary purpose of The General Constrained Problem?
A) It is used to the general constrained problem in mathematical analysis B) It is used only in advanced research contexts C) It is primarily a historical notation system D) It replaces all other methods in this domain
Correct: A)
- If you chose A: The General Constrained Problem serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
- If you chose B: This is incorrect. The General Constrained Problem serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose C: This is incorrect. The General Constrained Problem serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose D: This is incorrect. The General Constrained Problem serves the purpose described in the correct answer. The other options misrepresent its role.
Q4: Which statement about The Lagrangian is TRUE?
A) The Lagrangian is not related to this subject B) The Lagrangian is a fundamental concept covered in this subject C) The Lagrangian is mentioned only as a historical footnote D) The Lagrangian is an advanced topic beyond this subject's scope
Correct: B)
- If you chose A: This is incorrect. The Lagrangian is a fundamental concept covered in this subject. This subject covers The Lagrangian as part of its core content.
- If you chose B: The Lagrangian is a fundamental concept covered in this subject. This subject covers The Lagrangian as part of its core content. Correct!
- If you chose C: This is incorrect. The Lagrangian is a fundamental concept covered in this subject. This subject covers The Lagrangian as part of its core content.
- If you chose D: This is incorrect. The Lagrangian is a fundamental concept covered in this subject. This subject covers The Lagrangian as part of its core content.
Q5: Based on the worked examples in this subject, what is the correct result?
A) Shadow Price Interpretation B) An unrelated numerical value C) A different result from a common mistake D) The inverse of the correct answer
Correct: A)
- If you chose A: The worked examples show that the result is Shadow Price Interpretation. The other options represent common errors. Correct!
- If you chose B: This is incorrect. The worked examples show that the result is Shadow Price Interpretation. The other options represent common errors.
- If you chose C: This is incorrect. The worked examples show that the result is Shadow Price Interpretation. The other options represent common errors.
- If you chose D: This is incorrect. The worked examples show that the result is Shadow Price Interpretation. The other options represent common errors.
Q6: How are The Lagrangian and The Kkt Conditions (General) related?
A) The Lagrangian is a special case of The Kkt Conditions (General) B) The Lagrangian and The Kkt Conditions (General) are closely related concepts C) The Lagrangian is the inverse of The Kkt Conditions (General) D) The Lagrangian and The Kkt Conditions (General) are completely unrelated topics
Correct: B)
- If you chose A: This is incorrect. Both The Lagrangian and The Kkt Conditions (General) are covered in this subject as interconnected topics.
- If you chose B: Both The Lagrangian and The Kkt Conditions (General) are covered in this subject as interconnected topics. Correct!
- If you chose C: This is incorrect. Both The Lagrangian and The Kkt Conditions (General) are covered in this subject as interconnected topics.
- If you chose D: This is incorrect. Both The Lagrangian and The Kkt Conditions (General) are covered in this subject as interconnected topics.
Q7: What is a common pitfall when working with Constraint Qualifications?
A) A common mistake is confusing Constraint Qualifications with a similar concept B) Constraint Qualifications has no common misconceptions C) The main error with Constraint Qualifications is using it when it is not needed D) Constraint Qualifications is always computed the same way in all contexts
Correct: A)
- If you chose A: Students often confuse Constraint Qualifications with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!
- If you chose B: This is incorrect. Students often confuse Constraint Qualifications with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose C: This is incorrect. Students often confuse Constraint Qualifications with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose D: This is incorrect. Students often confuse Constraint Qualifications with similar-sounding or related concepts. Pay attention to the precise definitions.
Q8: When should you apply Example 1: Full Kkt Walkthrough?
A) Use Example 1: Full Kkt Walkthrough only in pure mathematics contexts B) Avoid Example 1: Full Kkt Walkthrough unless explicitly instructed C) Example 1: Full Kkt Walkthrough is not practically useful D) Apply Example 1: Full Kkt Walkthrough to solve problems in this subject's domain
Correct: D)
- If you chose A: This is incorrect. Example 1: Full Kkt Walkthrough is a practical tool used throughout this subject to solve relevant problems.
- If you chose B: This is incorrect. Example 1: Full Kkt Walkthrough is a practical tool used throughout this subject to solve relevant problems.
- If you chose C: This is incorrect. Example 1: Full Kkt Walkthrough is a practical tool used throughout this subject to solve relevant problems.
- If you chose D: Example 1: Full Kkt Walkthrough is a practical tool used throughout this subject to solve relevant problems. Correct!
Practice Problems
-
Solve $\min x^2 + y^2$ s.t. $x + 2y = 3$ using KKT.
Click for answer
Equality constraint only: $h = x+2y-3 = 0$. Stationarity: $2x + \nu = 0$, $2y + 2\nu = 0$ → $x = -\nu/2$, $y = -\nu$. Feasibility: $-\nu/2 + 2(-\nu) = 3$ → $-\nu/2 - 2\nu = 3$ → $-5\nu/2 = 3$ → $\nu = -6/5$. $x = 3/5$, $y = 6/5$, $f = (3/5)^2 + (6/5)^2 = 9/25 + 36/25 = 45/25 = 9/5$. -
Apply KKT to the problem $\min -x$ s.t. $x \leq 1$, $x \geq 0$ and identify the active constraints.
Click for answer
$g_1 = x-1 \leq 0$, $g_2 = -x \leq 0$. Stationarity: $-1 + \lambda_1 - \lambda_2 = 0$ → $\lambda_1 - \lambda_2 = 1$. CS: $\lambda_1(x-1)=0$, $\lambda_2(-x)=0$. Try $x=1$: $\lambda_2=0$ → $\lambda_1=1 \geq 0$ ✓. $x^*=1$, $f^*=-1$. Try $x=0$: $\lambda_1=0$ → $-\lambda_2=1$ → $\lambda_2=-1$ ✗ (dual feasibility violated). So $x^*=1$, only $g_1$ active. -
Explain why the LICQ fails at the optimum of $\min x_1$ s.t. $x_1^2 + x_2^2 \leq 0$, $x_1 \geq 0$. Does KKT still find the optimum?
Click for answer
Feasible set: only $(0,0)$ (the first constraint forces $x_1=x_2=0$). Active constraints at $(0,0)$: $g_1 = x_1^2+x_2^2 \leq 0$ (active), $g_2 = -x_1 \leq 0$ (active). $\nabla g_1(0,0) = (0,0)$, $\nabla g_2 = (-1,0)$. Only one nonzero gradient among two active constraints — LICQ fails. KKT stationarity: $(1,0) + \lambda_1(0,0) + \lambda_2(-1,0) = 0$ → $1-\lambda_2=0$, $0=0$ → $\lambda_2=1$, $\lambda_1$ arbitrary. KKT works here because the other constraint provides enough gradient information. -
A linear program has no strictly feasible point (all constraints are equalities or active). Does Slater's condition hold?
Click for answer
No — Slater's condition requires strict feasibility: there exists $\mathbf{x}$ with $g_i(\mathbf{x}) < 0$ for all inequality constraints. If all inequalities are tight at every feasible point, Slater fails. For LPs, Slater failure means strong duality might not hold (though for LPs specifically, weaker conditions suffice). In general convex optimization, Slater failure can mean a nonzero duality gap. -
Interpret $\lambda_i^* = 0$ in economic terms.
Click for answer
$\lambda_i^* = 0$ means the constraint is "not binding" — you have slack. The shadow price is zero: slightly tightening or relaxing the constraint would not change the optimal value. The resource constrained by $g_i$ is not scarce at the optimum. In ML: if a regularization parameter's KKT multiplier is zero, the corresponding regularizer is not actively constraining the solution — you could remove or reduce it without changing the optimum.
Summary
Key takeaways:
- The Lagrangian $\mathcal{L} = f + \sum\lambda_i g_i + \sum\nu_j h_j$ combines the objective and constraints
- $\lambda_i \geq 0$ for $g_i \leq 0$ — the sign convention is critical
- KKT conditions: primal feasibility, dual feasibility, stationarity, complementary slackness
- For non-convex problems: KKT is necessary (under CQs) but not sufficient
- Constraint qualifications (LICQ, Slater) ensure the linearized feasible directions match reality
- $\lambda_i$ is the shadow price — the marginal cost of tightening constraint $i$
Pitfalls
-
Forgetting to check constraint qualifications. KKT conditions can fail to hold at the optimum when LICQ is violated — e.g., $\min x$ s.t. $x^2 \leq 0$ has optimum $x^*=0$ but no KKT multipliers exist because the constraint gradient vanishes. Always verify a CQ holds before applying KKT.
-
Getting the sign convention for inequality constraints wrong. The standard form is $g_i(\mathbf{x}) \leq 0$ with $\lambda_i \geq 0$. If your constraint is $h(\mathbf{x}) \geq 0$, rewrite as $-h(\mathbf{x}) \leq 0$ before applying KKT. Mixing up the sign flips the direction of the penalty.
-
Treating KKT as sufficient for non-convex problems. On $\min -x^2$ s.t. $x \in [-1, 1]$, the KKT point $x=0$ (with $\lambda_1=\lambda_2=0$) is a global maximum, not a minimum. KKT is only necessary for general problems — you need convexity for sufficiency.
-
Misinterpreting shadow prices for large constraint changes. Shadow price $\lambda_i^$ is a local derivative: $\partial f^/\partial b_i$. Using it to predict the effect of large constraint relaxations can be wildly inaccurate when the active set changes.
-
Neglecting complementary slackness when solving analytically. The condition $\lambda_i^ g_i(\mathbf{x}^) = 0$ is the most useful KKT condition in practice — it dramatically reduces the casework by telling you which constraints matter. Always check both cases ($\lambda_i = 0$ and $g_i = 0$) systematically.
Next Steps
Next up: 14-09-lagrangian-duality.md — the dual problem, weak and strong duality, and how duality provides certificates of optimality.