14-09 — Lagrangian Duality
Phase: Optimization | Subject: 14-09 Prerequisites: 14-08-constrained-optimization-theory.md Next subject: 14-10-stochastic-nonconvex-optimization.md
Learning Objectives
By the end of this subject, you will be able to:
- Construct the Lagrangian dual function and the dual problem
- Explain weak duality and prove why the dual provides a lower bound
- State conditions for strong duality (zero duality gap)
- Interpret the dual problem as finding the best lower bound
- Recognize duality in ML contexts (SVM dual, GANs, variational inference)
Core Content
The Dual Function
For the primal problem $\min f(\mathbf{x})$ s.t. $g_i(\mathbf{x}) \leq 0$, $h_j(\mathbf{x}) = 0$, define the Lagrangian dual function:
$$\theta(\boldsymbol{\lambda}, \boldsymbol{\nu}) = \inf_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}) = \inf_{\mathbf{x}} \left[f(\mathbf{x}) + \sum_i \lambda_i g_i(\mathbf{x}) + \sum_j \nu_j h_j(\mathbf{x})\right]$$
Key property: $\theta(\boldsymbol{\lambda}, \boldsymbol{\nu})$ is concave (even when the primal is non-convex!) because it's the pointwise infimum of affine functions in $(\boldsymbol{\lambda}, \boldsymbol{\nu})$.
Weak Duality
For any $\boldsymbol{\lambda} \geq \mathbf{0}$ and any $\boldsymbol{\nu}$, and any feasible $\mathbf{x}$:
$$\theta(\boldsymbol{\lambda}, \boldsymbol{\nu}) \leq f(\mathbf{x})$$
Proof: For feasible $\mathbf{x}$: $g_i(\mathbf{x}) \leq 0$, $h_j(\mathbf{x}) = 0$, $\lambda_i \geq 0$, so: $\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}) = f(\mathbf{x}) + \underbrace{\sum\lambda_i g_i(\mathbf{x})}{\leq 0} + \underbrace{\sum\nu_j h_j(\mathbf{x})}{=0} \leq f(\mathbf{x})$
Since $\theta$ is the infimum over all $\mathbf{x}$: $\theta(\boldsymbol{\lambda}, \boldsymbol{\nu}) \leq \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}) \leq f(\mathbf{x})$. ∎
The Dual Problem
Weak duality says $\theta$ is always a lower bound. The dual problem seeks the best (largest) lower bound:
$$\max_{\boldsymbol{\lambda} \geq \mathbf{0},\; \boldsymbol{\nu}} \theta(\boldsymbol{\lambda}, \boldsymbol{\nu})$$
Let $p^$ = primal optimal value, $d^$ = dual optimal value. Weak duality: $d^ \leq p^$.
The difference $p^ - d^ \geq 0$ is the duality gap.
⚠️ CRITICAL — The dual is ALWAYS a concave maximization problem, regardless of the primal's convexity. The dual variables $\boldsymbol{\lambda}$ are constrained to be nonnegative, making this a tractable problem even when the primal is non-convex.
Strong Duality
Strong duality holds when $d^ = p^$ (zero duality gap). For convex problems, strong duality holds under Slater's condition: there exists a strictly feasible point ($g_i(\mathbf{x}) < 0$ for all $i$).
When strong duality holds, you can solve either the primal or the dual — they give the same optimal value. The dual is often easier (fewer variables, simpler constraints).
Duality in Machine Learning
SVMs: The SVM primal is a QP in $\mathbf{w}$ ($n$ variables). The dual is a QP in $\boldsymbol{\alpha}$ ($N$ variables — one per data point). The dual reveals that only support vectors ($\alpha_i > 0$) matter, enables the kernel trick, and is what SMO actually solves.
GANs: The minimax game $\min_G \max_D V(D,G)$ is a primal-dual formulation. The discriminator plays the role of the dual variable — tightening the lower bound on the Jensen-Shannon divergence.
Variational Inference: The ELBO (Evidence Lower BOund) $\mathcal{L}(q) = \mathbb{E}q[\ln p(\mathbf{x},\mathbf{z})] - \mathbb{E}_q[\ln q(\mathbf{z})]$ is a dual function. Maximizing the ELBO is solving the dual of $\min{q} \text{KL}(q | p(\mathbf{z}|\mathbf{x}))$.
The Dual of a Linear Program
The LP $\min \mathbf{c}^T\mathbf{x}$ s.t. $A\mathbf{x} = \mathbf{b}$, $\mathbf{x} \geq \mathbf{0}$ has dual:
$$\max \mathbf{b}^T\mathbf{y} \quad \text{s.t.} \quad A^T\mathbf{y} \leq \mathbf{c}$$
Strong duality holds for LPs (always, no CQ needed). The dual LP has one variable per primal constraint and one constraint per primal variable — a perfect symmetry.
Key Terms
- Lagrangian dual function
- Strong duality
Worked Examples
Example 1: Dual of a Simple QP
Find the dual of $\min \frac{1}{2}x^2$ s.t. $x \geq a$.
Solution: $f(x) = \frac{1}{2}x^2$, $g(x) = a - x \leq 0$. $\mathcal{L}(x, \lambda) = \frac{1}{2}x^2 + \lambda(a - x)$, $\lambda \geq 0$.
For fixed $\lambda$, minimize over $x$: $\frac{\partial\mathcal{L}}{\partial x} = x - \lambda = 0 \implies x = \lambda$.
$\theta(\lambda) = \frac{1}{2}\lambda^2 + \lambda(a - \lambda) = \lambda a - \frac{1}{2}\lambda^2$
Dual: $\max_{\lambda \geq 0} \lambda a - \frac{1}{2}\lambda^2$. The optimum: $a - \lambda = 0 \implies \lambda = a$ (if $a \geq 0$).
If $a = 1$: $\lambda^ = 1$, $d^ = 1 - 0.5 = 0.5$. Primal: $x^ = 1$, $p^ = 0.5$. Strong duality holds ✓.
If $a = -1$: $\lambda^ = 0$ (constrained by $\lambda \geq 0$), $d^ = 0$. Primal: unconstrained minimum at $x=0$, $p^* = 0$. Strong duality holds ✓.
Click for answer
Dual: $\max_{\lambda \geq 0} [\lambda a - \frac{1}{2}\lambda^2]$. Strong duality holds (Slater satisfied for any finite $a$).Example 2: LP Duality
Primal: $\min 2x_1 + 3x_2$ s.t. $x_1 + x_2 \geq 5$, $x_1 \geq 0$, $x_2 \geq 0$.
Solution: Convert to standard form: $\min 2x_1 + 3x_2$ s.t. $-x_1 - x_2 \leq -5$, $-x_1 \leq 0$, $-x_2 \leq 0$. $\mathcal{L} = 2x_1 + 3x_2 + \lambda_1(-x_1-x_2+5) + \lambda_2(-x_1) + \lambda_3(-x_2)$. $= (2 - \lambda_1 - \lambda_2)x_1 + (3 - \lambda_1 - \lambda_3)x_2 + 5\lambda_1$
For the infimum over $x_1, x_2 \geq 0$: if $2-\lambda_1-\lambda_2 < 0$, infimum is $-\infty$ (drive $x_1 \to \infty$). So we need $2-\lambda_1-\lambda_2 \geq 0$ and $3-\lambda_1-\lambda_3 \geq 0$.
$\theta(\boldsymbol{\lambda}) = 5\lambda_1$ (when constraints hold), $-\infty$ otherwise.
Dual: $\max 5\lambda_1$ s.t. $\lambda_1 + \lambda_2 \leq 2$, $\lambda_1 + \lambda_3 \leq 3$, $\lambda_i \geq 0$.
Optimal: $\lambda_1 = 2$ (binding first constraint), $\lambda_2 = 0$, $\lambda_3 = 1$. $d^* = 10$.
Primal optimum: $x_1 = 5$, $x_2 = 0$, $p^* = 10$. Strong duality ✓.
Click for answer
Dual: $\max_{\boldsymbol{\lambda} \geq 0} 5\lambda_1$ s.t. $\lambda_1+\lambda_2 \leq 2$, $\lambda_1+\lambda_3 \leq 3$. $d^* = p^* = 10$.Example 3: Duality Gap with Non-Convex Problem
$\min x^3 - 3x$ s.t. $x \in [-2, 2]$.
Solution: This is non-convex ($x^3$ is not convex). The Lagrangian with constraints $-x-2 \leq 0$, $x-2 \leq 0$: $\mathcal{L} = x^3 - 3x + \lambda_1(-x-2) + \lambda_2(x-2)$.
Dual function: $\theta(\lambda_1, \lambda_2) = \inf_{x} [x^3 - 3x + (\lambda_2 - \lambda_1)x - 2(\lambda_1 + \lambda_2)]$.
For each fixed $(\lambda_1, \lambda_2)$, as $x \to -\infty$, $x^3 \to -\infty$ (dominates linear terms) → $\theta = -\infty$ for ALL $(\lambda_1, \lambda_2)$! The dual is $-\infty$ and provides a trivial bound.
Even though $p^*$ is finite (the function has a minimum on $[-2,2]$), the dual is useless. This is the cost of non-convexity — duality can collapse entirely.
Click for answer
The dual function is identically $-\infty$ because the cubic term dominates as $x \to -\infty$. Duality provides no useful bound for this non-convex problem.Quiz
Q1: What does the concept of Lagrangian dual function primarily refer to in this subject?
A) A historical anecdote about Lagrangian dual function B) A visual representation of Lagrangian dual function C) The definition and application of Lagrangian dual function D) A computational error related to Lagrangian dual function
Correct: C)
- If you chose A: This is incorrect. Lagrangian dual function is defined as: the definition and application of lagrangian dual function. The other options describe different aspects that are not the primary focus.
- If you chose B: This is incorrect. Lagrangian dual function is defined as: the definition and application of lagrangian dual function. The other options describe different aspects that are not the primary focus.
- If you chose C: Lagrangian dual function is defined as: the definition and application of lagrangian dual function. The other options describe different aspects that are not the primary focus. Correct!
- If you chose D: This is incorrect. Lagrangian dual function is defined as: the definition and application of lagrangian dual function. The other options describe different aspects that are not the primary focus.
Q2: Which of the following is the key formula discussed in this subject?
A) A simplified version of \min f(\mathbf{x})... B) \min f(\mathbf{x}) C) The inverse operation of the formula in question D) An unrelated formula from a different topic
Correct: B)
- If you chose A: This is incorrect. The formula \min f(\mathbf{x}) is central to this subject. The other options are either simplified versions or unrelated.
- If you chose B: The formula \min f(\mathbf{x}) is central to this subject. The other options are either simplified versions or unrelated. Correct!
- If you chose C: This is incorrect. The formula \min f(\mathbf{x}) is central to this subject. The other options are either simplified versions or unrelated.
- If you chose D: This is incorrect. The formula \min f(\mathbf{x}) is central to this subject. The other options are either simplified versions or unrelated.
Q3: What is the primary purpose of Strong duality?
A) It is used to strong duality in mathematical analysis B) It is primarily a historical notation system C) It replaces all other methods in this domain D) It is used only in advanced research contexts
Correct: A)
- If you chose A: Strong duality serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
- If you chose B: This is incorrect. Strong duality serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose C: This is incorrect. Strong duality serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose D: This is incorrect. Strong duality serves the purpose described in the correct answer. The other options misrepresent its role.
Q4: Which statement about The Dual Function is TRUE?
A) The Dual Function is an advanced topic beyond this subject's scope B) The Dual Function is not related to this subject C) The Dual Function is a fundamental concept covered in this subject D) The Dual Function is mentioned only as a historical footnote
Correct: C)
- If you chose A: This is incorrect. The Dual Function is a fundamental concept covered in this subject. This subject covers The Dual Function as part of its core content.
- If you chose B: This is incorrect. The Dual Function is a fundamental concept covered in this subject. This subject covers The Dual Function as part of its core content.
- If you chose C: The Dual Function is a fundamental concept covered in this subject. This subject covers The Dual Function as part of its core content. Correct!
- If you chose D: This is incorrect. The Dual Function is a fundamental concept covered in this subject. This subject covers The Dual Function as part of its core content.
Q5: Based on the worked examples in this subject, what is the correct result?
A) $1 - y_i(\mathbf{w}^T\mathbf{x}_i + b) \leq 0$. B) The inverse of the correct answer C) A different result from a common mistake D) An unrelated numerical value
Correct: A)
- If you chose A: The worked examples show that the result is $1 - y_i(\mathbf{w}^T\mathbf{x}_i + b) \leq 0$.. The other options represent common errors. Correct!
- If you chose B: This is incorrect. The worked examples show that the result is $1 - y_i(\mathbf{w}^T\mathbf{x}_i + b) \leq 0$.. The other options represent common errors.
- If you chose C: This is incorrect. The worked examples show that the result is $1 - y_i(\mathbf{w}^T\mathbf{x}_i + b) \leq 0$.. The other options represent common errors.
- If you chose D: This is incorrect. The worked examples show that the result is $1 - y_i(\mathbf{w}^T\mathbf{x}_i + b) \leq 0$.. The other options represent common errors.
Q6: How are The Dual Function and Weak Duality related?
A) The Dual Function and Weak Duality are closely related concepts B) The Dual Function is a special case of Weak Duality C) The Dual Function and Weak Duality are completely unrelated topics D) The Dual Function is the inverse of Weak Duality
Correct: A)
- If you chose A: Both The Dual Function and Weak Duality are covered in this subject as interconnected topics. Correct!
- If you chose B: This is incorrect. Both The Dual Function and Weak Duality are covered in this subject as interconnected topics.
- If you chose C: This is incorrect. Both The Dual Function and Weak Duality are covered in this subject as interconnected topics.
- If you chose D: This is incorrect. Both The Dual Function and Weak Duality are covered in this subject as interconnected topics.
Q7: What is a common pitfall when working with The Dual Problem?
A) A common mistake is confusing The Dual Problem with a similar concept B) The main error with The Dual Problem is using it when it is not needed C) The Dual Problem is always computed the same way in all contexts D) The Dual Problem has no common misconceptions
Correct: A)
- If you chose A: Students often confuse The Dual Problem with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!
- If you chose B: This is incorrect. Students often confuse The Dual Problem with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose C: This is incorrect. Students often confuse The Dual Problem with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose D: This is incorrect. Students often confuse The Dual Problem with similar-sounding or related concepts. Pay attention to the precise definitions.
Q8: When should you apply Duality In Machine Learning?
A) Apply Duality In Machine Learning to solve problems in this subject's domain B) Duality In Machine Learning is not practically useful C) Avoid Duality In Machine Learning unless explicitly instructed D) Use Duality In Machine Learning only in pure mathematics contexts
Correct: A)
- If you chose A: Duality In Machine Learning is a practical tool used throughout this subject to solve relevant problems. Correct!
- If you chose B: This is incorrect. Duality In Machine Learning is a practical tool used throughout this subject to solve relevant problems.
- If you chose C: This is incorrect. Duality In Machine Learning is a practical tool used throughout this subject to solve relevant problems.
- If you chose D: This is incorrect. Duality In Machine Learning is a practical tool used throughout this subject to solve relevant problems.
Practice Problems
-
Derive the dual of $\min \frac{1}{2}|\mathbf{w}|^2$ s.t. $y_i(\mathbf{w}^T\mathbf{x}_i + b) \geq 1$ (SVM primal).
Click for answer
Convert to $\leq$ form: $1 - y_i(\mathbf{w}^T\mathbf{x}_i + b) \leq 0$. $\mathcal{L} = \frac{1}{2}\|\mathbf{w}\|^2 + \sum_i \alpha_i(1 - y_i(\mathbf{w}^T\mathbf{x}_i + b))$, $\alpha_i \geq 0$. Stationarity: $\partial\mathcal{L}/\partial\mathbf{w} = \mathbf{w} - \sum\alpha_i y_i\mathbf{x}_i = 0 \implies \mathbf{w} = \sum\alpha_i y_i\mathbf{x}_i$. $\partial\mathcal{L}/\partial b = -\sum\alpha_i y_i = 0$. Substitute back: $\theta(\boldsymbol{\alpha}) = \sum\alpha_i - \frac{1}{2}\sum_{i,j}\alpha_i\alpha_j y_i y_j \mathbf{x}_i^T\mathbf{x}_j$. Dual: $\max_{\boldsymbol{\alpha} \geq 0} \sum\alpha_i - \frac{1}{2}\sum_{i,j}\alpha_i\alpha_j y_i y_j \mathbf{x}_i^T\mathbf{x}_j$ s.t. $\sum\alpha_i y_i = 0$. This is the SVM dual — a QP in $N$ variables with box constraints. -
Show that the dual function $\theta(\boldsymbol{\lambda}, \boldsymbol{\nu})$ is always concave, even for non-convex primals.
Click for answer
$\theta(\boldsymbol{\lambda}, \boldsymbol{\nu}) = \inf_{\mathbf{x}} [f(\mathbf{x}) + \sum\lambda_i g_i(\mathbf{x}) + \sum\nu_j h_j(\mathbf{x})]$. For each fixed $\mathbf{x}$, the function inside the infimum is affine in $(\boldsymbol{\lambda}, \boldsymbol{\nu})$. The pointwise infimum of affine functions is concave (epigraph is intersection of halfspaces). This holds regardless of $f$, $g_i$, $h_j$ — the dual function is always concave. -
Primal: $\min e^x$ s.t. $x \geq 0$. Compute the dual and verify strong duality.
Click for answer
$g(x) = -x \leq 0$. $\mathcal{L} = e^x - \lambda x$, $\lambda \geq 0$. $\partial\mathcal{L}/\partial x = e^x - \lambda = 0 \implies x = \ln\lambda$ (if $\lambda > 0$). If $\lambda = 0$: the infimum in the dual function is over ALL $x$, unconstrained. So $\theta(0) = \inf_x e^x = 0$ (limit, not attained). For $\lambda > 0$: $\theta(\lambda) = e^{\ln\lambda} - \lambda\ln\lambda = \lambda - \lambda\ln\lambda = \lambda(1 - \ln\lambda)$. Maximize over $\lambda \geq 0$: $d\theta/d\lambda = 1 - \ln\lambda - 1 = -\ln\lambda = 0 \implies \lambda = 1$. $d^* = 1(1-0) = 1$. Primal: $x^* = 0$, $p^* = 1$. Strong duality ✓. -
Explain why the dual of a non-convex problem might still be useful.
Click for answer
Even with a duality gap ($d^* < p^*$), the dual provides: (1) a lower bound on the optimal value (certificate of quality — if you find a primal feasible point with value close to $d^*$, you know you're near optimal); (2) a convex relaxation — the dual is always a convex problem, solvable with standard tools; (3) for some non-convex problems (e.g., certain QCQPs, trust-region subproblems), strong duality holds despite non-convexity. In ML, the dual perspective often reveals structure the primal hides. -
For the LP dual pair, show that if the primal is unbounded below, the dual must be infeasible.
Click for answer
By weak duality: for any dual feasible $(\boldsymbol{\lambda}, \boldsymbol{\nu})$, $\theta(\boldsymbol{\lambda}, \boldsymbol{\nu}) \leq p^*$. If $p^* = -\infty$ (primal unbounded), then $\theta(\boldsymbol{\lambda}, \boldsymbol{\nu}) \leq -\infty$ for all dual feasible points — impossible unless there are NO dual feasible points. So the dual must be infeasible. Similarly, if the dual is unbounded above, the primal must be infeasible. This is the duality relationship for LPs.
Summary
Key takeaways:
- The Lagrangian dual function $\theta(\boldsymbol{\lambda}, \boldsymbol{\nu}) = \inf_{\mathbf{x}} \mathcal{L}$ is always concave
- Weak duality: $d^ \leq p^$ — the dual provides a lower bound (always, no assumptions)
- Strong duality ($d^ = p^$) holds for convex problems under Slater's condition
- The dual problem is always a concave maximization — tractable even for non-convex primals
- Dual variables are shadow prices: $\lambda_i^*$ = marginal value of relaxing constraint $i$
- Duality reveals structure: SVM kernel trick, GAN minimax, VI's ELBO all come from dual formulations
Pitfalls
-
Assuming strong duality always holds. Strong duality ($d^ = p^$) is guaranteed for convex problems under Slater's condition, but non-convex problems can have arbitrarily large duality gaps. The dual of $\min x^3 - 3x$ on $[-2,2]$ is identically $-\infty$, providing zero useful information.
-
Forgetting the dual is always concave. The dual function $\theta(\boldsymbol{\lambda}, \boldsymbol{\nu}) = \inf_{\mathbf{x}} \mathcal{L}$ is a pointwise infimum of affine functions and therefore concave — even when the primal is non-convex. This means the dual problem is always a tractable concave maximization.
-
Building the dual without checking whether $\theta$ is finite. If the Lagrangian is unbounded below (common with non-convex primals), $\theta(\boldsymbol{\lambda}, \boldsymbol{\nu}) = -\infty$ and the dual is useless. The dual only provides a nontrivial bound when the Lagrangian infimum is finite.
-
Confusing the direction of weak duality. The dual provides a lower bound: $d^ \leq p^$. A common error is writing $d^ \geq p^$ or thinking the dual is an upper bound. The dual maximizes the lower bound; the primal minimizes the objective.
-
Thinking a zero duality gap means the problem is convex. Some non-convex problems (e.g., trust-region subproblems, certain QCQPs) exhibit strong duality despite non-convexity. A zero gap is a property of the specific problem instance, not a guarantee of convexity.
Next Steps
Next up: 14-10-stochastic-nonconvex-optimization.md — stochastic approximation, non-convex convergence theory, and why deep learning works despite non-convexity.