Computer Science/Optimization

5. Duality

728x90

Lagrangian

The basic idea in Lagrangian duality is to take the constraints into account by augmenting the objective function with a weighted sum of the constraint functions.

💡

By using the Lagrangian, we can get a hint to solve the original problem which is not convex nor easy to solve.

Standard form problem (not necessary convex)

\text{minimize }f_0(x)

subject to

f_i(x) \le 0\quad i = 1, \dots, m\\ h_i(x) = 0\quad i = 1, \dots, p

where $x\in \R^n$ , domain $\mathcal D$ , optimal value $p^*$

Lagrangian

\mathcal L(x, \lambda, \nu) = f_0(x) + \sum_{i = 1}^m \lambda_i f_i(x) + \sum_{i = 1}^p \nu_i h_i(x)

where $\mathcal L: \R^n\times \R^m \times \R^p \to \R$ with $\text{dom }L = \mathcal D \times \R^m \times \R^p$

💡

We call

\lambda, \nu

as a dual variable or Lagrange multipliers

Lagrange dual function

Lagrange dual function

\begin{aligned}g(\lambda, \nu) &= \inf_{x\in \mathcal D} \mathcal L(x, \lambda, \nu) \\ &= \inf_{x\in \mathcal D}\bigg(f_0(x) + \sum_{i = 1}^m\lambda_if_i(x) + \sum_{i = 1}^p \nu_ih_i(x)\bigg) \end{aligned}

where $g:\R^m\times \R^p \to \R$

Since the dual function is the point-wise infimum of a family of affine functions of $(\lambda,\mu)$ , it is a concave function. This is because we can view this problem as a point-wise supremum problem by adding a negative term to our objective function. Even though we add a negative term, it is still affine on $(\lambda, \mu)$ (i.e it satisfies the convexity on $(\lambda, \mu)$ for fixed $x$ )

💡

When the Lagrangian is unbounded below in

x

, the dual function takes on the value

-\infty

Proof
We already know that convexity preserving operations.
If $f(x, y)$ is convex in $x$ for every $y\in \mathcal A$ , then
$g(x) = \sup_{y\in \mathcal A}f(x, y)$
is convex
Since $\mathcal L(x, \lambda, \nu)$ is concave in $(\lambda, \nu)$ for every $x\in \mathcal D$ , then
$\inf_{x\in D}\mathcal L(x, \lambda, \nu)$
is concave.

lower bound property

If $\lambda \succeq 0$ , then $g(\lambda, \nu)\le p^*$

Proof
If $\bar x$ is feasible and $\lambda \succeq0$ , then
$f_0(\bar x) \ge \mathcal L(\bar x, \lambda , \nu) \ge \inf_{x\in \mathcal D}\mathcal L(x, \lambda, \nu) = g(\lambda, \nu)$
minimizing over all feasible $\bar x$ gives $p^*\ge g(\lambda, \nu)$

Example

Least-norm solution of linear equations

\text{minimize }x^Tx

subject to

Ax = b

Dual function

Lagrangian is $\mathcal L(x, \nu) = x^Tx + \nu^T(Ax - b)$

to minimize $\mathcal L$ over $x$ , set gradient equal to zero
$\begin{aligned}\nabla_x\mathcal L(x, \nu) &=2x + A^T\nu \\&= 0\end{aligned}$
Therefore,
$x = - \frac{1}{2}A^T\nu$

plug in to $\mathcal L$ to obtain $g$
$g(\nu) = \mathcal L((-1/2)A^T\nu, \nu) = -\frac{1}{4}v^TAA^T\nu -b^T\nu$
a concave function of $\nu$

lower bound property
$p^* \ge -(1/4)\nu^TAA^T\nu - b^T\nu$
for all $\nu$

💡

It might seem useless at a first glance, but it is really useful. Let’s say we have a million equality constraints above. In this case, it is impossible to use analytical solution directly. Instead, we have to use iterative approach. Then we have to decide when should we stop. In this situation, we can use the lower bound.

Standard form LP

\text{minimize }c^Tx

subject to

Ax = b \quad x\succeq0

The Lagrangian is

\begin{aligned}\mathcal L(x, \lambda, \nu) &= c^Tx + \nu^T(Ax - b) - \lambda^Tx \\ &= -b^T\nu + (c+A^T\nu-\lambda)^Tx\end{aligned}

since it is affine in $x$ ,

g(\lambda,\nu) = \inf_x \mathcal L(x, \lambda, \nu) = \begin{cases}-b^T\nu & A^T\nu-\lambda + c = 0 \\ -\infty &\text{otherwise}\end{cases}

Therefore $p^*\ge -b^T\nu$ if $A^T\nu + c\succeq 0$

Equality constrained norm minimization

\text{minimize }\|x\|

subject to

Ax = b

The dual function is

g(\nu) = \inf_x (\|x\|-\nu^T(Ax) + b^T\nu) = \begin{cases}b^T\nu & \|A^T\nu\|_* \le 1 \\ -\infty & \text{otherwise}\end{cases}

where $\|v\|_* = \sup_{\|u\|\le 1 u^Tv}$ is dual norm of $\|\cdot \|$

Dual norm intuition

The dual of a norm $\|\cdot \|$ is defined as: $$\|z\|_* = \sup \{ z^Tx \text{ } | \text{ } \|x\| \le 1\}$$ Could anybody give me an intuition of this concept? I know the definition, I am using i...

https://math.stackexchange.com/questions/903484/dual-norm-intuition

Therefore $p^* \ge b^T\nu$ if $\|A^T\nu\|_* \le 1$

Two-way partitioning

\text{minimize }x^TWx

subject to

x_i^2 = 1\quad i = 1, \dots, n

We can express the objective as

\sum x_ix_jW_{ij}

Therefore, we can interpret $W_{ij}$ as a measure of hostility since if $W_{ij}$ is large, we have to allocate the opposite sign to $x_i$ and $x_j$ respectively.

The dual function is

\begin{aligned}g(\nu) &= \inf_x (x^TWx + \sum_i \nu_i(x_i^2 - 1)) \\ &= \inf_x x^T(W + \text{diag}(v))x - 1^T\nu \\ &= \begin{cases}-1^T\nu & W + \text{diag}(\nu)\succeq 0 \\ -\infty & \text{otherwise}\end{cases}\end{aligned}

Therefore $p^* \ge -1^T\nu$ if $W + \text{diag}(v) \succeq0$

Lagrange dual and conjugate function

\text{minimize }f_0(x)

subject to

Ax\preceq b \\ Cx = d

The dual function is

\begin{aligned}g(\lambda, \nu) &= \inf_{x\in \text{dom }f_0}(f_0(x) + (A^T\lambda + C^T\nu)^Tx - b^T\lambda -d^T\nu) \\ &= -f_0^*(-A^T\lambda - C^T\nu) - b^T\lambda - d^T\nu\end{aligned}

where $f^*(y) = \sup_{x\in \text{dom }f}(y^Tx - f(x))$

How to physically interpret conjugate functions?

If given a convex function $f: \mathbb{R} \to \mathbb{R}$, then the conjugate function $f^*$ is defined as $$f^*(s) = \sup_{t \in \mathbb{R}} (st-f(t))$$ Now i want to understand what is the physi...

https://math.stackexchange.com/questions/2225932/how-to-physically-interpret-conjugate-functions

Example : entropy maximization

f_0(x) = \sum_{i = 1}^n x_i \log x_i \\ f_0^*(y) = \sum_{i = 1}^n e^{y_i - 1}

💡

In this case, we already know the conjugate of

f_0

The Lagrange dual problem

For each pair $(\lambda, \nu)$ with $\lambda \succeq0$ , the Lagrange dual function gives us a lower bound on the optimal value $p^*$ of the optimization problem. Thus we have a lower bound that depends on some parameters $\lambda, \nu$ .

The natural question is: What is the best lower bound that can be obtained from the Lagrange dual function?

This leads to the optimization problem

\text{maximize }g(\lambda, \nu)

subject to

\lambda \succeq 0

This problem is called the Lagrange dual problem. The term dual feasible, to describe a pair $(\lambda, \nu)$ with $\lambda \succeq 0$ and $g(\lambda ,\nu)>-\infty$ .

💡

Note that the Lagrange dual problem is a convex optimization problem that doesn’t depend on the convexity of the original problem since the objective function to be maximized is concave and the constraint is convex

Example

standard form LP and its dual

standard form
$\text{minimize }c^Tx$
subject to
$Ax = b\\x\succeq 0$

dual problem
$\text{maximize }-b^T\nu$
subject to
$A^T\nu + c\succeq 0$

Weak and strong duality

weak duality : $d^* \le p^*$ (where $d^*$ is the optimal value of the dual problem)
- it always hold for convex and non-convex problems.
- it can be used to find non-trivial lower bounds for difficult problem

strong duality : $d^* = p^*$
- it does not hold in general
- (usually) holds for convex problems
- conditions that guarantee strong duality in convex problems are called constraint qualifications

Slater’s constraint qualification

One simple constraint qualification is Slater's condition

Let our problem is as follows

\text{minimize }f_0(x)

subject to

f_i(x) \le 0 \quad i = 1, \dots, m\\ Ax = b

where $f_0, \dots, f_m$ is a convex function.

If there exists an $x\in \text{relint }\mathcal D$ such that

f_i(x) < 0 \quad i = 1,\dots, m \\ Ax = b

Such a point is sometimes called strictly feasible

Slater’s theorem states that strong duality holds if Slater’s condition holds and the problem is convex.

💡

Slater’s condition must require the convexity of the problem

Example

Inequality from LP

Primal problem
$\text{minimize }c^Tx$
subject to
$Ax\preceq b$

dual function
$g(\lambda ) = \inf_x((c+A^T\lambda)^Tx - b^T\lambda) = \begin{cases}-b^T\lambda & A^T\lambda + c = 0 \\ -\infty & \text{otherwise}\end{cases}$

dual problem
$\text{maximize }-b^T\lambda$
subject to
$A^T\lambda + c = 0 \\ \lambda \succeq 0$
💡
Formally, it is not a dual problem. It is just a problem that equivalent to the dual problem.

From the Slater’s condition, $p^* = d^*$ if $A\bar x \prec b$ for some $\bar x$ .

In fact, $p^* = d^*$ except when primal and dual are infeasible.

💡

The strong duality holds if polyhedron has non-empty interior.

Quadratic program

Primal problem (assume $P\in S_{++}^n$ )
$\text{minimize }x^TPx$
subject to
$Ax\preceq b$

dual function
$g(\lambda ) = \inf_x (x^TPx + \lambda ^T(Ax - b)) = -\frac{1}{4}\lambda ^TAP^{-1}A^T\lambda - b^T\lambda$
💡
Since we already know that $P$ is positive definite, we can use the first optimality condition to find the minimum point.

dual problem
$\text{maximize }-\frac{1}{4}\lambda ^TAP^{-1}A^T\lambda - b^T\lambda$
subject to
$\lambda \succeq 0$

From the Slater’s condition, $p^* = d^*$ if $A\bar x \prec b$ for some $\bar x$

💡

In fact,

p^* = d^*

always holds.

Geometric interpretation

for simplicity, consider problem with one constraint $f_1(x) \le 0$

g(\lambda) = \inf_{(u, t)\in \mathcal G}(t + \lambda u)

where $\mathcal G = \{(f_1(x), f_0(x)) | \; x\in \mathcal D\}$

$\lambda u + t = g(\lambda )$ is supporting hyperplane to $\mathcal G$

hyperplane intersects $t$ -axis at $t = g(\lambda)$

Complementary slackness

Suppose that the primal and dual optimal values are attained and equal. Let $x^*$ be a primal optimal and $(\lambda^*, \nu^*)$ is dual optimal

\begin{aligned}f_0(x^*) = g(\lambda^*, \nu^*) &= \inf_x \bigg(f_0(x) + \sum_{i = 1}^m \lambda_i^*f_i(x) + \sum_{i = 1}^pv_i^*h_i(x)\bigg) \\ &\le f_0(x^*) + \sum_{i = 1}^m \lambda_i^*f_i(x^*) + \sum_{i = 1}^p\nu_i^*h_i(x^*) \\ &\le f_0(x^*)\end{aligned}

Therefore, the two inequalities holds with equality.

So, $x^*$ minimizes $\mathcal L(x, \lambda^*, \nu^*)$ and

\lambda_i^*f_i(x^*) = 0 \quad i = 1, \dots, m

More specifically,

\lambda_i^* >0 \Rightarrow f_i(x^*) = 0 \\ f_i(x^*) < 0 \Rightarrow \lambda_i^* = 0

It is known as complementary slackness

Karush-Kuhn-Tucker (KKT) conditions

We now assume that the functions $f_i, h_i$ are differentiable and therefore have open domains.

Let $x^*$ and $(\lambda^*, \nu^*)$ be any primal and dual optimal points with zero duality gap. Since $x^*$ minimizes $\mathcal L(x, \lambda^*, \nu^*)$ over $x$ , it follows that its gradient must vanish at $x^*$

\nabla f_0(x^*) + \sum_{i = 1}^m \lambda_i^* \nabla f_i(x^*) + \sum_{i = 1}^p \nu_i^*\nabla h_i(x^*) = 0

Thus the KKT conditions is as follows

primal constraints
$f_i(x^*) \le 0 \quad i = 1, \dots,m \\ h_i(x^*) = 0\quad i = 1, \dots , m$

dual constraints
$\lambda^* \succeq 0$
💡
Note that there is no constraint on $\nu$ .

complementary slackness
$\lambda_if_i(x^*) = 0\quad i =1, \dots, m$

stationary
$\nabla f_0(x^*) + \sum_{i = 1}^m \lambda_i^* \nabla f_i(x^*) + \sum_{i = 1}^p \nu_i^*\nabla h_i(x^*) = 0$

if strong duality holds and $x^*, \lambda^*, \nu^*$ are optimal, then they must satisfy the KKT conditions.

💡

In other words, if strong duality holds, KKT conditions hold for any pair of primal optimal and dual optimal

💡

Note that it is possible that even though

x, \lambda, \nu

satisfy the KKT conditions, we can't guarantee that it is an optimal point for primal and dual problems respectively.

KKT conditions for convex problems

When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal. In other words, if $f_i$ are convex and $h_i$ are affine, and $\tilde x, \tilde \lambda, \tilde \nu$ are any points that satisfy the KKT conditions, then $\tilde x$ and $(\tilde \lambda, \tilde \nu)$ are primal and dual optimal, with zero duality gap.

Since $\tilde \lambda_i \ge 0$ , $\mathcal L(x, \tilde \lambda, \tilde \nu)$ is convex in $x$ . Moreover, from the complementary slackness condition, $\tilde x$ minimizes $\mathcal L(x, \tilde \lambda, \tilde \nu)$ over $x$ .

Therefore,

\begin{aligned}g(\tilde \lambda, \tilde \nu) &= \mathcal L(\tilde x, \tilde \lambda, \tilde \nu) \\ &= f_0(\tilde x) + \sum_{i = 1}^m \tilde \lambda_i f_i(\tilde x) + \sum_{i = 1}^p \tilde \nu_ih_i(\tilde x) \\ &= f_0(\tilde x)\end{aligned}

This shows that $\tilde x$ and $(\tilde \lambda, \tilde \nu)$ have zero duality gap, and therefore are primal and dual optimal.

💡

For any convex optimization problem with differentiable objective functions, any points that satisfy the KKT conditions are primal and dual optimal, and have zero duality gap

💡

For convex problems, primal and dual optimality satisfy if and only if KKT condition holds

💡

But we can not guarantee that we can always find

(\tilde x, \tilde \lambda, \tilde \nu)

that satisfy the KKT conditions. We can just say that if we find that point, it is primal and dual optimal respectively.

KKT conditions for convex problems with Slater’s condition

If a convex optimization problem with differentiable objective and constraint functions satisfies Slater’s condition, then the KKT conditions provide necessary and sufficient conditions for optimality. Since Slater’s condition implies that the optimal duality gap is zero and the dual optimum is attained, so if $x$ is optimal we can always find $(\lambda, \nu)$ that satisfy the KKT conditions.

💡

Of course, if we find

(x, \lambda, \nu)

satisfy the KKT conditions, we can guarantee that

x

is an optimal point in a primal problem and

(\lambda, \nu)

is an optimal point in a dual problem.

Example : Water-filling

Assume $\alpha_i > 0$ ( $\alpha_i$ is a given value)

\text{minimize }-\sum_{i = 1}^n \log (x_i + \alpha_i)

subject to

x\succeq 0 \\ 1^Tx = 1

💡

Note that it is a convex optimization problem that satisfies the Slater’s condition

if $x$ is optimal, there exist $\lambda \in \R^n, \nu \in \R$ such that

primal feasibility
$x\succeq 0 \\ 1^Tx = 1$

dual feasibility
$\lambda \succeq 0$

complementary slackness
$\lambda_i x_i = 0 \quad i = 1, \dots, n$

stationary condition
$\begin{aligned}\nabla_x \mathcal L(x, \lambda, \nu) &= \nabla_x \bigg(-\sum_{i = 1}^n\log(x_i + \alpha_i) - \lambda^Tx + \nu^T(1^Tx - 1)\bigg) \\ &= \begin{bmatrix} - \frac{1}{x_1 + \alpha_1} - \lambda_1 + \nu\\ \vdots \\ - \frac{1}{x_n + \alpha_n} - \lambda_n + \nu\end{bmatrix} \\ & = 0\end{aligned}$
Therefore,
$- \frac{1}{x_i + \alpha_i} - \lambda_i + \nu = 0 \quad i = 1, \dots, n\\ \Rightarrow \nu = \frac{1}{x_i + \alpha_i} +\lambda_i \quad i = 1, \dots, n$

By using these conditions,

if $\nu < 1/\alpha_i$
$\lambda_i = 0 \text{ and }x_i = 1/\nu - \alpha_i$

if $\nu \ge 1/\alpha_i$
$\lambda_i = \nu - 1/\alpha_i \text{ and }x_i = 0$

Therefore, we can express $x$ as a

1^Tx = \sum_{i = 1}^n \max\{0, 1/\nu - \alpha_i\} = 1

Therefore, we can determine $\nu$ .

Perturbation and sensitivity analysis

When strong duality holds, the optimal dual variables give very useful information about the sensitivity of the optimal value with respect to perturbations of the constraints.

💡

For some problems, it is easier to solve a dual problem rather than a primal problem. In this case, we can use the optimal dual variable to get information about the primal problem.

Unperturbed optimization problem and its dual

Primal problem
$\text{minimize }f_0(x)$
subject to
$f_i(x) \le 0 \quad i = 1, \dots, m \\ h_i (x) = 0 \quad i = 1, \dots, p$

Dual problem
$\text{maximize }g(\lambda, \nu)$
subject to
$\lambda \succeq 0$

Perturbed problem and its dual

Primal problem
$\text{minimize }f_0(x)$
subject to
$f_i(x) \le u_i \quad i = 1, \dots, m \\ h_i(x) = v_i \quad i = 1, \dots, p$

Dual problem
$\text{maximize }g(\lambda, \nu) - u^T\lambda -v^T\nu$
such that
$\lambda \succeq 0$

💡

It increases the feasible set

where $x$ is a primal variable, $u, v$ are parameters

We define $p^*(u, v)$ as the optimal value of the perturbed problem

p^*(u, v) = \inf\{f_0(x) | \exist x \in \mathcal D, f_i(x) \le u_i, h_i(x) = v_i\}

When the original problem is convex, the function $p^*$ is a convex function of $u$ and $v$

💡

Whether

p^*

is convex or not is benign on the convexity of the original function because

x

is affected by

u

and

v

Global sensitivity result

Assume that strong duality holds and that the dual optimum is attained. (This is the case if the original problem is convex, and Slater’s condition is satisfied)

Let $(\lambda^*, \nu^*)$ be optimal for the dual of the unperturbed problem. Then for all $u$ and $v$ we have

\begin{aligned}p^*(u, v) &\ge g(\lambda^*, \nu^*) - u^T\lambda^* - v^T\nu^* &\\&= p^*(0, 0) - u^T\lambda^* - v^T\nu^*\end{aligned}

💡

The first inequality came from the fact that the dual function value of the perturbed problem is a lower bound of its primal.

We conclude that for any $x$ feasible for the perturbed problem, we have

f_0(x) \ge p^*(0, 0) - u^T\lambda^* - v^T\nu^*

Sensitivity interpretation

If $\lambda_i^*$ is large and we tighten $i$ th constraint (i.e. choose $u_i< 0$ ), then the optimal value $p^*(u, v)$ is guaranteed to increase greatly

If $\nu_i^*$ is large and positive and we take $v_i<0$ , or if $\nu_i^*$ is large and negative and we take $v_i>0$ , then the optimal value $p^*(u, v)$ is guaranteed to increase greatly

If $\lambda_i^*$ is small, and we loosen the $i$ th constraint $(u_i > 0)$ , then the optimal value $p^*(u, v)$ will not decrease too much

If $\nu_i^*$ is small and positive, and $v_i>0$ , or if $\nu_i^*$ is small and negative and $v_i <0$ , then the optimal value $p^*(u, v)$ will not decrease too much.

Local sensitivity

Suppose now that $p^*(u, v)$ is differentiable at $u= 0, v = 0$ . Then, provided strong duality holds, the optimal dual variables $\lambda^*, \nu^*$ are related to the gradient of $p^*$ at $u = 0, v= 0$

\lambda_i^* = - \frac{\partial p^*(0, 0)}{\partial u_i} \\ \nu_i^* = - \frac{\partial p^*(0, 0)}{\partial v_i}

Thus, when $p^*(u, v)$ is differentiable at $u = 0, v= 0$ , and strong duality holds, the optimal Lagrange multipliers are exactly the local sensitivities of the optimal value with respect to constraint perturbations.

Proof
Since strong duality holds and the dual optimal $(\lambda^*, \nu^*)$ is attained,
$p^*(u, v) \ge p^*(0, 0) - u^T\lambda^* - v^T\nu^*$
for all $u, v$
Take $u = te_i$ and $v = 0$ , then
$p^*(te_i, 0) \ge p^*(0, 0) - t\lambda_i^*$
If $t \ge 0$ ,
$\frac{p^*(te_i, 0) - p^*(0, 0)}{t} \ge - \lambda_i^* \\ \Rightarrow \lim_{t\to 0+} \frac{p^*(te_i, 0) - p^*(0, 0)}{t} \ge - \lambda_i^* \\ \Rightarrow \frac{\partial p^*(0, 0)}{\partial u_i}\ge -\lambda_i^*$
Similarly, if $t\le 0$ ,
$\frac{\partial p^*(0, 0)}{\partial u_i}\le -\lambda_i^*$
Therefore,
$\frac{\partial p^*(0, 0)}{\partial u_i}= -\lambda_i^*$
The same method can be used to establish
$\frac{\partial p^*(0, 0)}{\partial v_i}=-\nu_i^*$

The local sensitivity result gives us a quantitative measure of how active a constraint is at the optimum $x^*$ . If $f_i(x^*) < 0$ , then the constraint is inactive, and it follows that the constraint can be tightened or loosened a small amount without affecting the optimal value since by complementary slackness, the associated optimal Lagrange multiplier must be zero.

But now suppose that $f_i(x^*)=0$ . The $i$ th optimal Lagrange multiplier tells us how active the constraint is.

If $\lambda_i^*$ is small : the constraint can be loosened or tightened a bit without much effect on the optimal value

if $\lambda_i^*$ is large : if the constraint is loosened or tightened a bit, the effect on the optimal value will be great

Duality and problem reformulations

Equivalent formulations of a problem can lead to very different duals. Reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting.

Common reformulations

introduce new variables and equality constraints

make explicit constraints implicit or vice-versa

transform objective or constraint functions
→ replace $f_0(x)$ by $\phi(f_0(x))$ with $\phi$ convex, increasing

Introducing new variables and equality constraints

Consider an unconstrained problem of the form

\text{minimize }f_0(Ax + b)

The dual function is constant

g(\lambda, \nu) = \inf_x \mathcal L(x, \lambda, \nu) = \inf_x f_0(Ax +b ) = p^*

We have strong duality, but dual is quite useless.

Now let’s reformulate the above problem as

\text{minimize }f_0(y)

subject to

Ax + b = y

The dual function of the transformed problem is

\begin{aligned}g(\nu) &= \inf_{x,y} \mathcal L(x, \nu) \\ &=\inf_{x,y} f_0(y) +\nu^T(Ax + b - y) \\ &=\begin{cases}-f^*(\nu) + \nu^Tb & A^T\nu = 0 \\-\infty & \text{otherwise}\end{cases} \end{aligned}

Implicit constraints

Include some of the constraints in the objective function by modifying the objective function to be infinite when the constraint is violated

We consider the linear program

\text{minimize }c^Tx

subject to

Ax = b \\ l\preceq x \preceq u

where $A\in \R^{p\times n}$ and $l \prec u$ .

The dual problem is

\text{maximize } -b^T\nu - \lambda_1^Tu + \lambda_2^Tl

subject to

A^T\nu + \lambda_1 - \lambda_2 +c = 0 \\ \lambda_1 \succeq 0 \\ \lambda_2 \succeq 0

Reformulate the primal problem as

\text{minimize }f_0(x) = \begin{cases}c^Tx & l \preceq x \preceq u \\ \infty & \text{otherwise}\end{cases}

Then the dual function for the reformulated problem is

\begin{aligned}g(\nu) &= \inf_{l \preceq x \preceq u}(c^Tx + \nu^T(Ax-b)) \\ &= -b^T\nu - u^T(A^T\nu+c)^{-} + l^T(A^T\nu + c)^+\end{aligned}

where $y_i^+ = \max\{y_i, 0\}, y_i^- = \max\{-y_i, 0\}$

Therefore the dual problem is the unconstrained problem

\text{maximize }-b^T\nu - u^T(A^T\nu + c)^- + l^T(A^T\nu + c)^+

Generalized inequalities

We can extend the Lagrange duality to a problem with generalized inequality constraints.

\text{minimize }f_0(x)

subject to

f_i(x) \preceq_{K_i}0 \quad i = 1, \dots, m \\ h_i(x) = 0 \quad i = 1, \dots, p

where $K_i\subset \R^{k_i}$ are proper cones.

💡

There is no assumption about the convexity of the problem.

The Lagrange dual

With each generalized inequality $f_i(x) \preceq_{K_i}0$ , we associate a Lagrange multiplier vector $\lambda_i\in \R^{k_i}$ and define the associated Lagrangian as

\mathcal L(x, \lambda_1, \cdots, \lambda_m, \nu) = f_0(x) + \sum_{i = 1}^m \lambda_i^Tf_i(x) + \sum_{i = 1}^p \nu_ih_i(x)

The dual function is defined exactly as in a problem with scalar inequalities

\begin{aligned}g(\lambda_1, \dots, \lambda_m , \nu) &= \inf_{x\in \mathcal D}\mathcal L(x, \lambda_1, \dots, \lambda_m, \nu)\\ &= \inf_{x\in \mathcal D}\bigg(f_0(x) + \sum_{i = 1}^m \lambda_i^Tf_i(x) + \sum_{i = 1}^p \nu_ih_i(x)\bigg)\end{aligned}

Since the Lagrangian is affine in the dual variables $(\lambda, \nu)$ , and the dual function is a point-wise infimum of the Lagrangian, the dual function is concave.

As in a problem with scalar inequalities, the dual function gives lower bounds on $p^*$ . Similar to $\lambda_i \ge 0$ the scalar case, it requires the non-negativity requirement on the dual variables for the generalized inequality case.

\lambda_i\succeq_{K_i^*} 0 \quad i = 1, \dots, m

where $K_i^*$ denotes the dual cone of $K_i$ .

Weak duality follows immediately from the definition of dual cone because if $\lambda_i\succeq_{K_i^*}0$ and $f_i(\tilde x) \preceq_{K_i}0$ , then $\lambda_i^Tf_i(\tilde x) \le 0$ . Therefore for any primal feasible point $\tilde x$ and any $\lambda_i \succeq_{K_i^*}0$ for all $i$ , then

\begin{aligned}f_0(x) &\ge f_0(x) + \sum_{i = 1}^m\lambda_i^Tf_i(x) + \sum_{i = 1}^p \nu_ih_i(x) \\ &\ge \inf_{x\in \mathcal D}\mathcal L(x, \lambda_1, \dots, \lambda_m, \nu) \\ &= g(\lambda_1, \dots, \lambda_m, \nu)\end{aligned}

Therefore,

p^* \ge g(\lambda_1, \dots, \lambda_m, \nu)

The Lagrange dual optimization problem is

\text{maximize }g(\lambda_1, \dots, \lambda_m, \nu)

subject to

\lambda_i \succeq _{K_i}0 \quad i = 1, \dots, m

Slater’s condition and strong duality

Strong duality holds when the primal problem is convex and satisfies an appropriate constraint qualification as the scalar case such as Slater’s condition.

Example : Semidefinite program

We consider a semi-definite program in inequality form

\text{minimize }c^Tx

subject to

x_1 F_1 + \cdots + x_nF_n \preceq G

where $F_i, G\in S^{k}$ . (Here $f_1$ is affine, and $K_1$ is $S_{+}^k$ , the positive semi-definite cone.)

Since $S_{+}^k$ is self-dual, the Lagrange multiplier $Z$ is in $S^k$ . Then the Lagrangian is

\begin{aligned}\mathcal L(x, Z) &= c^Tx + \text{tr}(Z(x_1F_1 + \cdots + x_nF_n - G)) \\ &= x_1(c_1 + \text{tr}(ZF_1)) + \cdots + x_n(c_n + \text{tr}(ZF_n)) - \text{tr}(ZG)\end{aligned}

Therefore, the dual function is

g(Z) = \inf_x \mathcal L(x, Z)=\begin{cases}-\text{tr}(ZG) & c_i + \text{tr}(ZF_i) = 0,\forall i \\-\infty & \text{otherwise}\end{cases}

The dual problem can be expressed as

\text{maximize }-\text{tr}(GZ)

subject to

c_i + \text{tr}(ZF_i) = 0 \quad i= 1, \dots, n \\ Z\succeq0

💡

Note that we use

S_{+}^k

inequality

Strong duality obtains if there is a strictly feasible point because it satisfies Slater’s condition. In other words, there exists an $x$ with

x_1 F_1 + \cdots+ x_nF_n -G \prec 0

💡

Note that an inner product of two symmetric matrices can be defined as a trace of its multiplication. Check the dual generalized inequality in convex set section.

Contents

5. Duality

Lagrangian

Lagrange dual function

Example

Least-norm solution of linear equations

Standard form LP

Equality constrained norm minimization

Two-way partitioning

Lagrange dual and conjugate function

Example : entropy maximization

The Lagrange dual problem

Example

Weak and strong duality

Slater’s constraint qualification

Example

Inequality from LP

Quadratic program

Geometric interpretation

Complementary slackness

Karush-Kuhn-Tucker (KKT) conditions

KKT conditions for convex problems

KKT conditions for convex problems with Slater’s condition

Example : Water-filling

Perturbation and sensitivity analysis

Unperturbed optimization problem and its dual

Perturbed problem and its dual

Global sensitivity result

Sensitivity interpretation

Local sensitivity

Duality and problem reformulations

Common reformulations

Introducing new variables and equality constraints

Implicit constraints

Generalized inequalities

The Lagrange dual

Slater’s condition and strong duality

Example : Semidefinite program

당신이 좋아할만한 콘텐츠

티스토리툴바