Computer Science/Optimization

5. Duality

  • -
728x90
반응형

Lagrangian

The basic idea in Lagrangian duality is to take the constraints into account by augmenting the objective function with a weighted sum of the constraint functions.

💡
By using the Lagrangian, we can get a hint to solve the original problem which is not convex nor easy to solve.

Standard form problem (not necessary convex)

minimize f0(x)\text{minimize }f_0(x)

subject to

fi(x)0i=1,,mhi(x)=0i=1,,pf_i(x) \le 0\quad i = 1, \dots, m\\ h_i(x) = 0\quad i = 1, \dots, p

where xRnx\in \R^n, domain D\mathcal D, optimal value pp^*

Lagrangian

L(x,λ,ν)=f0(x)+i=1mλifi(x)+i=1pνihi(x)\mathcal L(x, \lambda, \nu) = f_0(x) + \sum_{i = 1}^m \lambda_i f_i(x) + \sum_{i = 1}^p \nu_i h_i(x)

where L:Rn×Rm×RpR\mathcal L: \R^n\times \R^m \times \R^p \to \R with dom L=D×Rm×Rp\text{dom }L = \mathcal D \times \R^m \times \R^p

💡
We call λ,ν\lambda, \nu as a dual variable or Lagrange multipliers

Lagrange dual function

Lagrange dual function

g(λ,ν)=infxDL(x,λ,ν)=infxD(f0(x)+i=1mλifi(x)+i=1pνihi(x))\begin{aligned}g(\lambda, \nu) &= \inf_{x\in \mathcal D} \mathcal L(x, \lambda, \nu) \\ &= \inf_{x\in \mathcal D}\bigg(f_0(x) + \sum_{i = 1}^m\lambda_if_i(x) + \sum_{i = 1}^p \nu_ih_i(x)\bigg) \end{aligned}

where g:Rm×RpRg:\R^m\times \R^p \to \R

Since the dual function is the point-wise infimum of a family of affine functions of (λ,μ)(\lambda,\mu), it is a concave function. This is because we can view this problem as a point-wise supremum problem by adding a negative term to our objective function. Even though we add a negative term, it is still affine on (λ,μ)(\lambda, \mu) (i.e it satisfies the convexity on (λ,μ)(\lambda, \mu) for fixed xx)

💡
When the Lagrangian is unbounded below in xx, the dual function takes on the value -\infty
  • Proof

    We already know that convexity preserving operations.

    If f(x,y)f(x, y) is convex in xx for every yAy\in \mathcal A, then

    g(x)=supyAf(x,y)g(x) = \sup_{y\in \mathcal A}f(x, y)

    is convex

    Since L(x,λ,ν)\mathcal L(x, \lambda, \nu) is concave in (λ,ν)(\lambda, \nu) for every xDx\in \mathcal D, then

    infxDL(x,λ,ν)\inf_{x\in D}\mathcal L(x, \lambda, \nu)

    is concave.

lower bound property

If λ0\lambda \succeq 0, then g(λ,ν)pg(\lambda, \nu)\le p^*

  • Proof

    If xˉ\bar x is feasible and λ0\lambda \succeq0, then

    f0(xˉ)L(xˉ,λ,ν)infxDL(x,λ,ν)=g(λ,ν)f_0(\bar x) \ge \mathcal L(\bar x, \lambda , \nu) \ge \inf_{x\in \mathcal D}\mathcal L(x, \lambda, \nu) = g(\lambda, \nu)

    minimizing over all feasible xˉ\bar x gives pg(λ,ν)p^*\ge g(\lambda, \nu)

Example

Least-norm solution of linear equations

minimize xTx\text{minimize }x^Tx

subject to

Ax=b Ax = b

Dual function

  • Lagrangian is L(x,ν)=xTx+νT(Axb)\mathcal L(x, \nu) = x^Tx + \nu^T(Ax - b)
  • to minimize L\mathcal L over xx, set gradient equal to zero
    xL(x,ν)=2x+ATν=0\begin{aligned}\nabla_x\mathcal L(x, \nu) &=2x + A^T\nu \\&= 0\end{aligned}

    Therefore,

    x=12ATνx = - \frac{1}{2}A^T\nu
  • plug in to L\mathcal L to obtain gg
    g(ν)=L((1/2)ATν,ν)=14vTAATνbTνg(\nu) = \mathcal L((-1/2)A^T\nu, \nu) = -\frac{1}{4}v^TAA^T\nu -b^T\nu

    a concave function of ν\nu

  • lower bound property
    p(1/4)νTAATνbTνp^* \ge -(1/4)\nu^TAA^T\nu - b^T\nu

    for all ν\nu

💡
It might seem useless at a first glance, but it is really useful. Let’s say we have a million equality constraints above. In this case, it is impossible to use analytical solution directly. Instead, we have to use iterative approach. Then we have to decide when should we stop. In this situation, we can use the lower bound.

Standard form LP

minimize cTx\text{minimize }c^Tx

subject to

Ax=bx0Ax = b \quad x\succeq0

The Lagrangian is

L(x,λ,ν)=cTx+νT(Axb)λTx=bTν+(c+ATνλ)Tx\begin{aligned}\mathcal L(x, \lambda, \nu) &= c^Tx + \nu^T(Ax - b) - \lambda^Tx \\ &= -b^T\nu + (c+A^T\nu-\lambda)^Tx\end{aligned}

since it is affine in xx,

g(λ,ν)=infxL(x,λ,ν)={bTνATνλ+c=0otherwiseg(\lambda,\nu) = \inf_x \mathcal L(x, \lambda, \nu) = \begin{cases}-b^T\nu & A^T\nu-\lambda + c = 0 \\ -\infty &\text{otherwise}\end{cases}

Therefore pbTνp^*\ge -b^T\nu if ATν+c0A^T\nu + c\succeq 0

Equality constrained norm minimization

minimize x\text{minimize }\|x\|

subject to

Ax=bAx = b

The dual function is

g(ν)=infx(xνT(Ax)+bTν)={bTνATν1otherwiseg(\nu) = \inf_x (\|x\|-\nu^T(Ax) + b^T\nu) = \begin{cases}b^T\nu & \|A^T\nu\|_* \le 1 \\ -\infty & \text{otherwise}\end{cases}

where v=supu1uTv\|v\|_* = \sup_{\|u\|\le 1 u^Tv} is dual norm of \|\cdot \|

Dual norm intuition
The dual of a norm $\|\cdot \|$ is defined as: $$\|z\|_* = \sup \{ z^Tx \text{ } | \text{ } \|x\| \le 1\}$$ Could anybody give me an intuition of this concept? I know the definition, I am using i...
https://math.stackexchange.com/questions/903484/dual-norm-intuition

Therefore pbTνp^* \ge b^T\nu if ATν1\|A^T\nu\|_* \le 1

Two-way partitioning

minimize xTWx\text{minimize }x^TWx

subject to

xi2=1i=1,,nx_i^2 = 1\quad i = 1, \dots, n

We can express the objective as

xixjWij\sum x_ix_jW_{ij}

Therefore, we can interpret WijW_{ij} as a measure of hostility since if WijW_{ij} is large, we have to allocate the opposite sign to xix_i and xjx_j respectively.

The dual function is

g(ν)=infx(xTWx+iνi(xi21))=infxxT(W+diag(v))x1Tν={1TνW+diag(ν)0otherwise\begin{aligned}g(\nu) &= \inf_x (x^TWx + \sum_i \nu_i(x_i^2 - 1)) \\ &= \inf_x x^T(W + \text{diag}(v))x - 1^T\nu \\ &= \begin{cases}-1^T\nu & W + \text{diag}(\nu)\succeq 0 \\ -\infty & \text{otherwise}\end{cases}\end{aligned}

Therefore p1Tνp^* \ge -1^T\nu if W+diag(v)0W + \text{diag}(v) \succeq0

Lagrange dual and conjugate function

minimize f0(x)\text{minimize }f_0(x)

subject to

AxbCx=dAx\preceq b \\ Cx = d

The dual function is

g(λ,ν)=infxdom f0(f0(x)+(ATλ+CTν)TxbTλdTν)=f0(ATλCTν)bTλdTν\begin{aligned}g(\lambda, \nu) &= \inf_{x\in \text{dom }f_0}(f_0(x) + (A^T\lambda + C^T\nu)^Tx - b^T\lambda -d^T\nu) \\ &= -f_0^*(-A^T\lambda - C^T\nu) - b^T\lambda - d^T\nu\end{aligned}

where f(y)=supxdom f(yTxf(x))f^*(y) = \sup_{x\in \text{dom }f}(y^Tx - f(x))

How to physically interpret conjugate functions?
If given a convex function $f: \mathbb{R} \to \mathbb{R}$, then the conjugate function $f^*$ is defined as $$f^*(s) = \sup_{t \in \mathbb{R}} (st-f(t))$$ Now i want to understand what is the physi...
https://math.stackexchange.com/questions/2225932/how-to-physically-interpret-conjugate-functions

Example : entropy maximization

f0(x)=i=1nxilogxif0(y)=i=1neyi1f_0(x) = \sum_{i = 1}^n x_i \log x_i \\ f_0^*(y) = \sum_{i = 1}^n e^{y_i - 1}
💡
In this case, we already know the conjugate of f0f_0

The Lagrange dual problem

For each pair (λ,ν)(\lambda, \nu) with λ0\lambda \succeq0, the Lagrange dual function gives us a lower bound on the optimal value pp^* of the optimization problem. Thus we have a lower bound that depends on some parameters λ,ν\lambda, \nu.

The natural question is: What is the best lower bound that can be obtained from the Lagrange dual function?

This leads to the optimization problem

maximize g(λ,ν)\text{maximize }g(\lambda, \nu)

subject to

λ0\lambda \succeq 0

This problem is called the Lagrange dual problem. The term dual feasible, to describe a pair (λ,ν)(\lambda, \nu) with λ0\lambda \succeq 0 and g(λ,ν)>g(\lambda ,\nu)>-\infty.

💡
Note that the Lagrange dual problem is a convex optimization problem that doesn’t depend on the convexity of the original problem since the objective function to be maximized is concave and the constraint is convex

Example

standard form LP and its dual

  1. standard form
    minimize cTx\text{minimize }c^Tx

    subject to

    Ax=bx0Ax = b\\x\succeq 0
  1. dual problem
    maximize bTν\text{maximize }-b^T\nu

    subject to

    ATν+c0A^T\nu + c\succeq 0

Weak and strong duality

  1. weak duality : dpd^* \le p^* (where dd^* is the optimal value of the dual problem)
    • it always hold for convex and non-convex problems.
    • it can be used to find non-trivial lower bounds for difficult problem
  1. strong duality : d=pd^* = p^*
    • it does not hold in general
    • (usually) holds for convex problems
    • conditions that guarantee strong duality in convex problems are called constraint qualifications

Slater’s constraint qualification

One simple constraint qualification is Slater's condition

Let our problem is as follows

minimize f0(x)\text{minimize }f_0(x)

subject to

fi(x)0i=1,,mAx=bf_i(x) \le 0 \quad i = 1, \dots, m\\ Ax = b

where f0,,fmf_0, \dots, f_m is a convex function.

If there exists an xrelint Dx\in \text{relint }\mathcal D such that

fi(x)<0i=1,,mAx=bf_i(x) < 0 \quad i = 1,\dots, m \\ Ax = b

Such a point is sometimes called strictly feasible

Slater’s theorem states that strong duality holds if Slater’s condition holds and the problem is convex.

💡
Slater’s condition must require the convexity of the problem

Example

Inequality from LP

  1. Primal problem
    minimize cTx\text{minimize }c^Tx

    subject to

    AxbAx\preceq b
  1. dual function
    g(λ)=infx((c+ATλ)TxbTλ)={bTλATλ+c=0otherwiseg(\lambda ) = \inf_x((c+A^T\lambda)^Tx - b^T\lambda) = \begin{cases}-b^T\lambda & A^T\lambda + c = 0 \\ -\infty & \text{otherwise}\end{cases}
  1. dual problem
    maximize bTλ\text{maximize }-b^T\lambda

    subject to

    ATλ+c=0λ0A^T\lambda + c = 0 \\ \lambda \succeq 0
    💡
    Formally, it is not a dual problem. It is just a problem that equivalent to the dual problem.

From the Slater’s condition, p=dp^* = d^* if AxˉbA\bar x \prec b for some xˉ\bar x.

In fact, p=dp^* = d^* except when primal and dual are infeasible.

💡
The strong duality holds if polyhedron has non-empty interior.

Quadratic program

  1. Primal problem (assume PS++nP\in S_{++}^n)
    minimize xTPx\text{minimize }x^TPx

    subject to

    AxbAx\preceq b
  1. dual function
    g(λ)=infx(xTPx+λT(Axb))=14λTAP1ATλbTλg(\lambda ) = \inf_x (x^TPx + \lambda ^T(Ax - b)) = -\frac{1}{4}\lambda ^TAP^{-1}A^T\lambda - b^T\lambda
    💡
    Since we already know that PP is positive definite, we can use the first optimality condition to find the minimum point.
  1. dual problem
    maximize 14λTAP1ATλbTλ\text{maximize }-\frac{1}{4}\lambda ^TAP^{-1}A^T\lambda - b^T\lambda

    subject to

    λ0\lambda \succeq 0

From the Slater’s condition, p=dp^* = d^* if AxˉbA\bar x \prec b for some xˉ\bar x

💡
In fact, p=dp^* = d^* always holds.

Geometric interpretation

for simplicity, consider problem with one constraint f1(x)0f_1(x) \le 0

g(λ)=inf(u,t)G(t+λu)g(\lambda) = \inf_{(u, t)\in \mathcal G}(t + \lambda u)

where G={(f1(x),f0(x))  xD}\mathcal G = \{(f_1(x), f_0(x)) | \; x\in \mathcal D\}

  • λu+t=g(λ)\lambda u + t = g(\lambda ) is supporting hyperplane to G\mathcal G
  • hyperplane intersects tt-axis at t=g(λ)t = g(\lambda)

Complementary slackness

Suppose that the primal and dual optimal values are attained and equal. Let xx^* be a primal optimal and (λ,ν)(\lambda^*, \nu^*) is dual optimal

f0(x)=g(λ,ν)=infx(f0(x)+i=1mλifi(x)+i=1pvihi(x))f0(x)+i=1mλifi(x)+i=1pνihi(x)f0(x)\begin{aligned}f_0(x^*) = g(\lambda^*, \nu^*) &= \inf_x \bigg(f_0(x) + \sum_{i = 1}^m \lambda_i^*f_i(x) + \sum_{i = 1}^pv_i^*h_i(x)\bigg) \\ &\le f_0(x^*) + \sum_{i = 1}^m \lambda_i^*f_i(x^*) + \sum_{i = 1}^p\nu_i^*h_i(x^*) \\ &\le f_0(x^*)\end{aligned}

Therefore, the two inequalities holds with equality.

So, xx^* minimizes L(x,λ,ν)\mathcal L(x, \lambda^*, \nu^*) and

λifi(x)=0i=1,,m\lambda_i^*f_i(x^*) = 0 \quad i = 1, \dots, m

More specifically,

λi>0fi(x)=0fi(x)<0λi=0\lambda_i^* >0 \Rightarrow f_i(x^*) = 0 \\ f_i(x^*) < 0 \Rightarrow \lambda_i^* = 0

It is known as complementary slackness

Karush-Kuhn-Tucker (KKT) conditions

We now assume that the functions fi,hif_i, h_i are differentiable and therefore have open domains.

Let xx^* and (λ,ν)(\lambda^*, \nu^*) be any primal and dual optimal points with zero duality gap. Since xx^* minimizes L(x,λ,ν)\mathcal L(x, \lambda^*, \nu^*) over xx, it follows that its gradient must vanish at xx^*

f0(x)+i=1mλifi(x)+i=1pνihi(x)=0\nabla f_0(x^*) + \sum_{i = 1}^m \lambda_i^* \nabla f_i(x^*) + \sum_{i = 1}^p \nu_i^*\nabla h_i(x^*) = 0

Thus the KKT conditions is as follows

  1. primal constraints
    fi(x)0i=1,,mhi(x)=0i=1,,mf_i(x^*) \le 0 \quad i = 1, \dots,m \\ h_i(x^*) = 0\quad i = 1, \dots , m
  1. dual constraints
    λ0\lambda^* \succeq 0
    💡
    Note that there is no constraint on ν\nu.
  1. complementary slackness
    λifi(x)=0i=1,,m\lambda_if_i(x^*) = 0\quad i =1, \dots, m
  1. stationary
    f0(x)+i=1mλifi(x)+i=1pνihi(x)=0\nabla f_0(x^*) + \sum_{i = 1}^m \lambda_i^* \nabla f_i(x^*) + \sum_{i = 1}^p \nu_i^*\nabla h_i(x^*) = 0

if strong duality holds and x,λ,νx^*, \lambda^*, \nu^* are optimal, then they must satisfy the KKT conditions.

💡
In other words, if strong duality holds, KKT conditions hold for any pair of primal optimal and dual optimal
💡
Note that it is possible that even though x,λ,νx, \lambda, \nu satisfy the KKT conditions, we can't guarantee that it is an optimal point for primal and dual problems respectively.

KKT conditions for convex problems

When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal. In other words, if fif_i are convex and hih_i are affine, and x~,λ~,ν~\tilde x, \tilde \lambda, \tilde \nu are any points that satisfy the KKT conditions, then x~\tilde x and (λ~,ν~)(\tilde \lambda, \tilde \nu) are primal and dual optimal, with zero duality gap.

Since λ~i0\tilde \lambda_i \ge 0, L(x,λ~,ν~)\mathcal L(x, \tilde \lambda, \tilde \nu) is convex in xx. Moreover, from the complementary slackness condition, x~\tilde x minimizes L(x,λ~,ν~)\mathcal L(x, \tilde \lambda, \tilde \nu) over xx.

Therefore,

g(λ~,ν~)=L(x~,λ~,ν~)=f0(x~)+i=1mλ~ifi(x~)+i=1pν~ihi(x~)=f0(x~)\begin{aligned}g(\tilde \lambda, \tilde \nu) &= \mathcal L(\tilde x, \tilde \lambda, \tilde \nu) \\ &= f_0(\tilde x) + \sum_{i = 1}^m \tilde \lambda_i f_i(\tilde x) + \sum_{i = 1}^p \tilde \nu_ih_i(\tilde x) \\ &= f_0(\tilde x)\end{aligned}

This shows that x~\tilde x and (λ~,ν~)(\tilde \lambda, \tilde \nu) have zero duality gap, and therefore are primal and dual optimal.

💡
For any convex optimization problem with differentiable objective functions, any points that satisfy the KKT conditions are primal and dual optimal, and have zero duality gap
💡
For convex problems, primal and dual optimality satisfy if and only if KKT condition holds
💡
But we can not guarantee that we can always find (x~,λ~,ν~)(\tilde x, \tilde \lambda, \tilde \nu) that satisfy the KKT conditions. We can just say that if we find that point, it is primal and dual optimal respectively.

KKT conditions for convex problems with Slater’s condition

If a convex optimization problem with differentiable objective and constraint functions satisfies Slater’s condition, then the KKT conditions provide necessary and sufficient conditions for optimality. Since Slater’s condition implies that the optimal duality gap is zero and the dual optimum is attained, so if xx is optimal we can always find (λ,ν)(\lambda, \nu) that satisfy the KKT conditions.

💡
Of course, if we find (x,λ,ν)(x, \lambda, \nu) satisfy the KKT conditions, we can guarantee that xx is an optimal point in a primal problem and (λ,ν)(\lambda, \nu) is an optimal point in a dual problem.

Example : Water-filling

Assume αi>0\alpha_i > 0 (αi\alpha_i is a given value)

minimize i=1nlog(xi+αi)\text{minimize }-\sum_{i = 1}^n \log (x_i + \alpha_i)

subject to

x01Tx=1x\succeq 0 \\ 1^Tx = 1
💡
Note that it is a convex optimization problem that satisfies the Slater’s condition

if xx is optimal, there exist λRn,νR\lambda \in \R^n, \nu \in \R such that

  1. primal feasibility
    x01Tx=1x\succeq 0 \\ 1^Tx = 1
  1. dual feasibility
    λ0\lambda \succeq 0
  1. complementary slackness
    λixi=0i=1,,n\lambda_i x_i = 0 \quad i = 1, \dots, n
  1. stationary condition
    xL(x,λ,ν)=x(i=1nlog(xi+αi)λTx+νT(1Tx1))=[1x1+α1λ1+ν1xn+αnλn+ν]=0\begin{aligned}\nabla_x \mathcal L(x, \lambda, \nu) &= \nabla_x \bigg(-\sum_{i = 1}^n\log(x_i + \alpha_i) - \lambda^Tx + \nu^T(1^Tx - 1)\bigg) \\ &= \begin{bmatrix} - \frac{1}{x_1 + \alpha_1} - \lambda_1 + \nu\\ \vdots \\ - \frac{1}{x_n + \alpha_n} - \lambda_n + \nu\end{bmatrix} \\ & = 0\end{aligned}

    Therefore,

    1xi+αiλi+ν=0i=1,,nν=1xi+αi+λii=1,,n- \frac{1}{x_i + \alpha_i} - \lambda_i + \nu = 0 \quad i = 1, \dots, n\\ \Rightarrow \nu = \frac{1}{x_i + \alpha_i} +\lambda_i \quad i = 1, \dots, n

By using these conditions,

  1. if ν<1/αi\nu < 1/\alpha_i
    λi=0 and xi=1/ναi\lambda_i = 0 \text{ and }x_i = 1/\nu - \alpha_i
  1. if ν1/αi\nu \ge 1/\alpha_i
    λi=ν1/αi and xi=0\lambda_i = \nu - 1/\alpha_i \text{ and }x_i = 0

Therefore, we can express xx as a

1Tx=i=1nmax{0,1/ναi}=11^Tx = \sum_{i = 1}^n \max\{0, 1/\nu - \alpha_i\} = 1

Therefore, we can determine ν\nu.

Perturbation and sensitivity analysis

When strong duality holds, the optimal dual variables give very useful information about the sensitivity of the optimal value with respect to perturbations of the constraints.

💡
For some problems, it is easier to solve a dual problem rather than a primal problem. In this case, we can use the optimal dual variable to get information about the primal problem.

Unperturbed optimization problem and its dual

  1. Primal problem
    minimize f0(x)\text{minimize }f_0(x)

    subject to

    fi(x)0i=1,,mhi(x)=0i=1,,pf_i(x) \le 0 \quad i = 1, \dots, m \\ h_i (x) = 0 \quad i = 1, \dots, p
  1. Dual problem
    maximize g(λ,ν)\text{maximize }g(\lambda, \nu)

    subject to

    λ0\lambda \succeq 0

Perturbed problem and its dual

  1. Primal problem
    minimize f0(x)\text{minimize }f_0(x)

    subject to

    fi(x)uii=1,,mhi(x)=vii=1,,pf_i(x) \le u_i \quad i = 1, \dots, m \\ h_i(x) = v_i \quad i = 1, \dots, p
  1. Dual problem
    maximize g(λ,ν)uTλvTν\text{maximize }g(\lambda, \nu) - u^T\lambda -v^T\nu

    such that

    λ0\lambda \succeq 0
💡
It increases the feasible set

where xx is a primal variable, u,vu, v are parameters

We define p(u,v)p^*(u, v) as the optimal value of the perturbed problem

p(u,v)=inf{f0(x)xD,fi(x)ui,hi(x)=vi}p^*(u, v) = \inf\{f_0(x) | \exist x \in \mathcal D, f_i(x) \le u_i, h_i(x) = v_i\}

When the original problem is convex, the function pp^* is a convex function of uu and vv

💡
Whether pp^* is convex or not is benign on the convexity of the original function because xx is affected by uu and vv

Global sensitivity result

Assume that strong duality holds and that the dual optimum is attained. (This is the case if the original problem is convex, and Slater’s condition is satisfied)

Let (λ,ν)(\lambda^*, \nu^*) be optimal for the dual of the unperturbed problem. Then for all uu and vv we have

p(u,v)g(λ,ν)uTλvTν=p(0,0)uTλvTν\begin{aligned}p^*(u, v) &\ge g(\lambda^*, \nu^*) - u^T\lambda^* - v^T\nu^* &\\&= p^*(0, 0) - u^T\lambda^* - v^T\nu^*\end{aligned}
💡
The first inequality came from the fact that the dual function value of the perturbed problem is a lower bound of its primal.

We conclude that for any xx feasible for the perturbed problem, we have

f0(x)p(0,0)uTλvTνf_0(x) \ge p^*(0, 0) - u^T\lambda^* - v^T\nu^*

Sensitivity interpretation

  1. If λi\lambda_i^* is large and we tighten iith constraint (i.e. choose ui<0u_i< 0), then the optimal value p(u,v)p^*(u, v) is guaranteed to increase greatly
  1. If νi\nu_i^* is large and positive and we take vi<0v_i<0, or if νi\nu_i^* is large and negative and we take vi>0v_i>0, then the optimal value p(u,v)p^*(u, v) is guaranteed to increase greatly
  1. If λi\lambda_i^* is small, and we loosen the iith constraint (ui>0)(u_i > 0), then the optimal value p(u,v)p^*(u, v) will not decrease too much
  1. If νi\nu_i^* is small and positive, and vi>0v_i>0, or if νi\nu_i^* is small and negative and vi<0v_i <0, then the optimal value p(u,v)p^*(u, v) will not decrease too much.

Local sensitivity

Suppose now that p(u,v)p^*(u, v) is differentiable at u=0,v=0u= 0, v = 0. Then, provided strong duality holds, the optimal dual variables λ,ν\lambda^*, \nu^* are related to the gradient of pp^* at u=0,v=0u = 0, v= 0

λi=p(0,0)uiνi=p(0,0)vi\lambda_i^* = - \frac{\partial p^*(0, 0)}{\partial u_i} \\ \nu_i^* = - \frac{\partial p^*(0, 0)}{\partial v_i}

Thus, when p(u,v)p^*(u, v) is differentiable at u=0,v=0u = 0, v= 0, and strong duality holds, the optimal Lagrange multipliers are exactly the local sensitivities of the optimal value with respect to constraint perturbations.

  • Proof

    Since strong duality holds and the dual optimal (λ,ν)(\lambda^*, \nu^*) is attained,

    p(u,v)p(0,0)uTλvTνp^*(u, v) \ge p^*(0, 0) - u^T\lambda^* - v^T\nu^*

    for all u,vu, v

    Take u=teiu = te_i and v=0v = 0, then

    p(tei,0)p(0,0)tλip^*(te_i, 0) \ge p^*(0, 0) - t\lambda_i^*

    If t0t \ge 0,

    p(tei,0)p(0,0)tλilimt0+p(tei,0)p(0,0)tλip(0,0)uiλi\frac{p^*(te_i, 0) - p^*(0, 0)}{t} \ge - \lambda_i^* \\ \Rightarrow \lim_{t\to 0+} \frac{p^*(te_i, 0) - p^*(0, 0)}{t} \ge - \lambda_i^* \\ \Rightarrow \frac{\partial p^*(0, 0)}{\partial u_i}\ge -\lambda_i^*

    Similarly, if t0t\le 0,

    p(0,0)uiλi\frac{\partial p^*(0, 0)}{\partial u_i}\le -\lambda_i^*

    Therefore,

    p(0,0)ui=λi\frac{\partial p^*(0, 0)}{\partial u_i}= -\lambda_i^*

    The same method can be used to establish

    p(0,0)vi=νi\frac{\partial p^*(0, 0)}{\partial v_i}=-\nu_i^*

The local sensitivity result gives us a quantitative measure of how active a constraint is at the optimum xx^*. If fi(x)<0f_i(x^*) < 0, then the constraint is inactive, and it follows that the constraint can be tightened or loosened a small amount without affecting the optimal value since by complementary slackness, the associated optimal Lagrange multiplier must be zero.

But now suppose that fi(x)=0f_i(x^*)=0. The iith optimal Lagrange multiplier tells us how active the constraint is.

  1. If λi\lambda_i^* is small : the constraint can be loosened or tightened a bit without much effect on the optimal value
  1. if λi\lambda_i^* is large : if the constraint is loosened or tightened a bit, the effect on the optimal value will be great

Duality and problem reformulations

Equivalent formulations of a problem can lead to very different duals. Reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting.

Common reformulations

  1. introduce new variables and equality constraints
  1. make explicit constraints implicit or vice-versa
  1. transform objective or constraint functions

    → replace f0(x)f_0(x) by ϕ(f0(x))\phi(f_0(x)) with ϕ\phi convex, increasing

Introducing new variables and equality constraints

Consider an unconstrained problem of the form

minimize f0(Ax+b)\text{minimize }f_0(Ax + b)

The dual function is constant

g(λ,ν)=infxL(x,λ,ν)=infxf0(Ax+b)=pg(\lambda, \nu) = \inf_x \mathcal L(x, \lambda, \nu) = \inf_x f_0(Ax +b ) = p^*

We have strong duality, but dual is quite useless.

Now let’s reformulate the above problem as

minimize f0(y)\text{minimize }f_0(y)

subject to

Ax+b=yAx + b = y

The dual function of the transformed problem is

g(ν)=infx,yL(x,ν)=infx,yf0(y)+νT(Ax+by)={f(ν)+νTbATν=0otherwise\begin{aligned}g(\nu) &= \inf_{x,y} \mathcal L(x, \nu) \\ &=\inf_{x,y} f_0(y) +\nu^T(Ax + b - y) \\ &=\begin{cases}-f^*(\nu) + \nu^Tb & A^T\nu = 0 \\-\infty & \text{otherwise}\end{cases} \end{aligned}

Implicit constraints

Include some of the constraints in the objective function by modifying the objective function to be infinite when the constraint is violated

We consider the linear program

minimize cTx\text{minimize }c^Tx

subject to

Ax=blxuAx = b \\ l\preceq x \preceq u

where ARp×nA\in \R^{p\times n} and lul \prec u.

The dual problem is

maximize bTνλ1Tu+λ2Tl\text{maximize } -b^T\nu - \lambda_1^Tu + \lambda_2^Tl

subject to

ATν+λ1λ2+c=0λ10λ20A^T\nu + \lambda_1 - \lambda_2 +c = 0 \\ \lambda_1 \succeq 0 \\ \lambda_2 \succeq 0

Reformulate the primal problem as

minimize f0(x)={cTxlxuotherwise\text{minimize }f_0(x) = \begin{cases}c^Tx & l \preceq x \preceq u \\ \infty & \text{otherwise}\end{cases}

Then the dual function for the reformulated problem is

g(ν)=inflxu(cTx+νT(Axb))=bTνuT(ATν+c)+lT(ATν+c)+\begin{aligned}g(\nu) &= \inf_{l \preceq x \preceq u}(c^Tx + \nu^T(Ax-b)) \\ &= -b^T\nu - u^T(A^T\nu+c)^{-} + l^T(A^T\nu + c)^+\end{aligned}

where yi+=max{yi,0},yi=max{yi,0}y_i^+ = \max\{y_i, 0\}, y_i^- = \max\{-y_i, 0\}

Therefore the dual problem is the unconstrained problem

maximize bTνuT(ATν+c)+lT(ATν+c)+\text{maximize }-b^T\nu - u^T(A^T\nu + c)^- + l^T(A^T\nu + c)^+

Generalized inequalities

We can extend the Lagrange duality to a problem with generalized inequality constraints.

minimize f0(x)\text{minimize }f_0(x)

subject to

fi(x)Ki0i=1,,mhi(x)=0i=1,,pf_i(x) \preceq_{K_i}0 \quad i = 1, \dots, m \\ h_i(x) = 0 \quad i = 1, \dots, p

where KiRkiK_i\subset \R^{k_i} are proper cones.

💡
There is no assumption about the convexity of the problem.

The Lagrange dual

With each generalized inequality fi(x)Ki0f_i(x) \preceq_{K_i}0, we associate a Lagrange multiplier vector λiRki\lambda_i\in \R^{k_i} and define the associated Lagrangian as

L(x,λ1,,λm,ν)=f0(x)+i=1mλiTfi(x)+i=1pνihi(x)\mathcal L(x, \lambda_1, \cdots, \lambda_m, \nu) = f_0(x) + \sum_{i = 1}^m \lambda_i^Tf_i(x) + \sum_{i = 1}^p \nu_ih_i(x)

The dual function is defined exactly as in a problem with scalar inequalities

g(λ1,,λm,ν)=infxDL(x,λ1,,λm,ν)=infxD(f0(x)+i=1mλiTfi(x)+i=1pνihi(x))\begin{aligned}g(\lambda_1, \dots, \lambda_m , \nu) &= \inf_{x\in \mathcal D}\mathcal L(x, \lambda_1, \dots, \lambda_m, \nu)\\ &= \inf_{x\in \mathcal D}\bigg(f_0(x) + \sum_{i = 1}^m \lambda_i^Tf_i(x) + \sum_{i = 1}^p \nu_ih_i(x)\bigg)\end{aligned}

Since the Lagrangian is affine in the dual variables (λ,ν)(\lambda, \nu), and the dual function is a point-wise infimum of the Lagrangian, the dual function is concave.

As in a problem with scalar inequalities, the dual function gives lower bounds on pp^*. Similar to λi0\lambda_i \ge 0 the scalar case, it requires the non-negativity requirement on the dual variables for the generalized inequality case.

λiKi0i=1,,m\lambda_i\succeq_{K_i^*} 0 \quad i = 1, \dots, m

where KiK_i^* denotes the dual cone of KiK_i.

Weak duality follows immediately from the definition of dual cone because if λiKi0\lambda_i\succeq_{K_i^*}0 and fi(x~)Ki0f_i(\tilde x) \preceq_{K_i}0, then λiTfi(x~)0\lambda_i^Tf_i(\tilde x) \le 0. Therefore for any primal feasible point x~\tilde x and any λiKi0\lambda_i \succeq_{K_i^*}0 for all ii, then

f0(x)f0(x)+i=1mλiTfi(x)+i=1pνihi(x)infxDL(x,λ1,,λm,ν)=g(λ1,,λm,ν)\begin{aligned}f_0(x) &\ge f_0(x) + \sum_{i = 1}^m\lambda_i^Tf_i(x) + \sum_{i = 1}^p \nu_ih_i(x) \\ &\ge \inf_{x\in \mathcal D}\mathcal L(x, \lambda_1, \dots, \lambda_m, \nu) \\ &= g(\lambda_1, \dots, \lambda_m, \nu)\end{aligned}

Therefore,

pg(λ1,,λm,ν)p^* \ge g(\lambda_1, \dots, \lambda_m, \nu)

The Lagrange dual optimization problem is

maximize g(λ1,,λm,ν)\text{maximize }g(\lambda_1, \dots, \lambda_m, \nu)

subject to

λiKi0i=1,,m\lambda_i \succeq _{K_i}0 \quad i = 1, \dots, m

Slater’s condition and strong duality

Strong duality holds when the primal problem is convex and satisfies an appropriate constraint qualification as the scalar case such as Slater’s condition.

Example : Semidefinite program

We consider a semi-definite program in inequality form

minimize cTx\text{minimize }c^Tx

subject to

x1F1++xnFnGx_1 F_1 + \cdots + x_nF_n \preceq G

where Fi,GSkF_i, G\in S^{k}. (Here f1f_1 is affine, and K1K_1 is S+kS_{+}^k, the positive semi-definite cone.)

Since S+kS_{+}^k is self-dual, the Lagrange multiplier ZZ is in SkS^k. Then the Lagrangian is

L(x,Z)=cTx+tr(Z(x1F1++xnFnG))=x1(c1+tr(ZF1))++xn(cn+tr(ZFn))tr(ZG)\begin{aligned}\mathcal L(x, Z) &= c^Tx + \text{tr}(Z(x_1F_1 + \cdots + x_nF_n - G)) \\ &= x_1(c_1 + \text{tr}(ZF_1)) + \cdots + x_n(c_n + \text{tr}(ZF_n)) - \text{tr}(ZG)\end{aligned}

Therefore, the dual function is

g(Z)=infxL(x,Z)={tr(ZG)ci+tr(ZFi)=0,iotherwiseg(Z) = \inf_x \mathcal L(x, Z)=\begin{cases}-\text{tr}(ZG) & c_i + \text{tr}(ZF_i) = 0,\forall i \\-\infty & \text{otherwise}\end{cases}

The dual problem can be expressed as

maximize tr(GZ)\text{maximize }-\text{tr}(GZ)

subject to

ci+tr(ZFi)=0i=1,,nZ0c_i + \text{tr}(ZF_i) = 0 \quad i= 1, \dots, n \\ Z\succeq0
💡
Note that we use S+kS_{+}^k inequality

Strong duality obtains if there is a strictly feasible point because it satisfies Slater’s condition. In other words, there exists an xx with

x1F1++xnFnG0x_1 F_1 + \cdots+ x_nF_n -G \prec 0
💡
Note that an inner product of two symmetric matrices can be defined as a trace of its multiplication. Check the dual generalized inequality in convex set section.
반응형
Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.