Computer Science/Optimization

9. Optimality Conditions for Constrained Optimization

  • -
728x90
반응형

Introduction

In constrained optimization problem, the constraints can be

  1. Linear or non-linear
  1. Equalities or inequalities

Generally, our objective function and constraints can be represented as follows

minf(x)\min f(x)

subject to\text{subject to}

gi(x)=0,i=Ehi(x)0,i=Ig_i(x) = 0, i = \mathcal E \\ h_i(x) \ge 0, i = \mathcal I

Therefore, it is much harder than unconstrained optimization.

In addition, we have to change the definition of local optimum slightly as follows:

xx^* is a local minimum if

f(x)f(x),xBϵ(x)f(x^*) \le f(x), \forall x \in B_\epsilon(x^*)

where gi(x)=0,i=Eg_i(x) = 0, i = \mathcal E and hi(x)0,i=Ih_i(x) \ge 0, i = \mathcal I for some ϵ>0\epsilon > 0.

💡
기존의 local minimum의 정의에서 feasible region만 관찰하는 것으로 조건이 바뀐 것이라고 이해하면 된다.

Optimality condition

  • Boundary point : It is optimal on the boundary if no feasible and descent direction with respect to the active constraints.

    → We define pp to be a feasible direction at the point xx if there exists some ϵ>0\epsilon > 0  such that x+αpS, 0αϵx + \alpha p \in S, \forall \text{ }0 \le \alpha \le \epsilon

  • However, for non-linear equality, it could be feasible curves rather than feasible directions

Linear Equality Constraints

Null space Approach

Note that for all linear equality constraints can be converted into the following form:

minf(x)\min f(x)

subject to\text{subject to}

Ax=bAx = b

Consider the general solution update concept. For feasibility, the search direction must be in the null space of AA. (Assume, we know at least one feasible point)

💡
Feasible region quotient space 기준으로 동일한 equivalence class에 속한다는 사실을 이용하겠다는 의미이다. 즉 업데이트가 가능한 방향은 kernel에 속한 원소만큼만 가능하다.

Reformulation

Assume we already know a feasible solution x\overline x, i.e.

Ax=bA\overline x = b

Take pN(A)p\in \mathcal N(A), then

A(x+p)=bA(\overline x + p) = b

By using this fact, we can get rid of the constraints

minvϕ(v)=f(x+Zv)\min_v \phi(v) = f(\overline x + Zv)

where columns in ZZ is the basis of N(A)\mathcal N(A)

Therefore, the optimality conditions of unconstrained optimization for function ϕ\phi with respect to vv are as follows

  1. ϕ(v)=ZTf(x)=0\nabla \phi(v) = Z^T\nabla f(x) = 0
  1. 2ϕ(v)=ZT2f(x)Z\nabla^2 \phi(v) = Z^T\nabla^2f(x) Z is positive semi-definite.
💡
It is also known as reduced gradient and reduced Hessian respectively.

Lagrange multiplier Approach

If xx^* is a local optimum, then there exists λ\lambda^* such that

f(x)=ATλ\nabla f(x^*) = A^T\lambda^*

In other words, the gradient is a linear combination of the rows of AA

  • λ\lambda are known as Lagrange multipliers

How can we derive above formula?

minf(x)\min f(x)

subject to\text{subject to}

Ax=bAx = b

Define Lagrangian function such that

L(x,λ)=f(x)λT(Axb)\mathcal L(x, \lambda) = f(x) - \lambda^T (Ax-b)

A stationary point (x,λ)(x^*, \lambda^*) of the function has L(x,λ)=0\nabla \mathcal L(x^*, \lambda^*) = 0

💡
A local optimum xx of the original problem must together with some λ\lambda be a stationary point of the Lagrangian function.
💡
즉 다시 말해서 stationary point가 되는 xx^*가 original problem과 Lagrangian function이 같아진다는 것을 이용해서 원래 original function이 아닌 Lagrangian function으로 바꿔서 non-constraint problem으로 돌려서 풀겠다는 의미이다.
💡
추가적으로 Lagrangian term 앞에 -를 붙인 이유는 나중에 inequality constraint를 \ge로 사용하기 위함이다. 만약 \le로 사용하고 싶으면 + 를 사용해주면 된다.

Therefore, first-order condition for Lagrangian function is as follows

f(x)ATλ=0Axb=0\nabla f(x) - A^T\lambda = 0 \\ Ax - b = 0
💡
잘 생각해보면 직관적으로 이해할 수 있다. AA의 row들이 결국 constraint의 gradient와 같으므로 f(x)\nabla f(x)가 해당 gradient들과 나란하다는 의미이다.

Linear Inequality Constraints

Introduction

minf(x)\min f(x)

subject to\text{subject to}

AxbAx \ge b
  • An interior point : optimality conditions for unconstrained optimization
  • A boundary point : for which some constraints are active

Note : The feasible region is a polyhedron/polytope since its constraints are just a linear function.

If we know which constraints are active at optimum, then we can solve an equality problem.

💡
The non-active constraint is irrelevant to local optimality.

Necessary Conditions

Introduction

minf(x)\min f(x)

subject to\text{subject to}

AxbAx \ge b

Given a solution xx^*, let A^\hat A be the sub-matrix of rows of active constraints. Denote by b^\hat b the corresponding right-hand side.

A local optimum must also be locally optimum to the restricted equality problem with the active constraints

Necessary Conditions

minf(x)\min f(x)

subject to\text{subject to}

A^xb^\hat Ax \ge \hat b
λ0,f(x)=A^Tλ with λ0\exist \lambda^* \ge 0, \nabla f(x^*) = \hat A^T\lambda^* \text{ with }\lambda^* \ge 0
💡
At a local optimum the Lagrangian multipliers are non-negative
💡
AxbAx - b가 양수가 되는 것에 대한 penalty를 준다고 이해해도 되고, A^Tλ\hat A^T \lambda^* 방향은 interior 방향이다. 즉, f(x)\nabla f(x^*)가 해당 방향과 orientation이 같으면서 평행하다는 것의 의미는 interior 방향으로 이동하게 되면 함숫값이 증가한다는 것을 의미한다. 이는 xx^*가 local minima라는 의미와 부합한다.

Complementary Slackness

(λ)T(Axb)=0(\lambda^*)^T(Ax^* - b) = 0

By the condition above, we don’t need to explicitly split the constraints in to active and non-active ones. We just make its values to the non-negative.

💡
The multiplier of each inactive constraint must be a zero Lagrangian multiplier.

Therefore, we can write the condition as

λ0,f(x)=ATλ\exists \lambda^* \ge 0, \nabla f(x^*) = A^T \lambda^*

For an inactive row ii in AA, λi=0\lambda_i = 0 by complementary slackness.

💡
즉 다시 말해서 active constraint와 non-active constraint를 나누지 말고 Lagrangian multipler가 양수가 되게끔만 제약을 거는 것이다. 또한 inactive constraint의 경우에는 Lagrangian multiplier가 0이 되게끔 하면 된다.

Putting the Pieces Together

If xx^* is a local optimum, there exists a vector λ\lambda^* such that

  • First order condition
    1. f(x)=ATλ\nabla f(x^*) = A^T \lambda^* (this is equivalent to ZTf(x)=0Z^T \nabla f(x^*) = 0)
    1. λ0\lambda^* \ge 0 (Dual feasibility)
    1. λ(Axb)=0\lambda^*(Ax^* - b) = 0 (Complementary slackness conditions)
  • Second order condition
    1. ZT2f(x)ZZ^T \nabla^2f(x^*) Z is positive semi-definite

where ZZ is a null-space matrix for the active constraints.

💡
For linear equality, the conditions are the fist bullet and the last bullet for all constraints. Also there is no sign restriction on λ\lambda
💡
We can obtain the condition for linear equality by writing it as two sets of inequalities and apply the above.

Sufficient Conditions

  • First order condition
    1. AxbAx^* \ge b
    1. f(x)=ATλ\nabla f(x^*) = A^T \lambda^* (this is equivalent to ZTf(x)=0Z^T \nabla f(x^*) = 0)
    1. λ0\lambda^* \ge 0 (Dual feasibility)
    1. strict complementary slackness

      → Complementary slackness condition에서 λi=0\lambda^*_i = 0 이거나 aiTxbi=0a_i^Tx^* - b_i = 0 둘 중 하나만 해당하는 경우를 strict complementary slackness라고 부른다.

  • Second order condition
    1. ZT2f(x)ZZ^T \nabla^2f(x^*) Z is positive definite

where ZZ is a null-space matrix for the active constraints.

Interpretation of KKT condition and Duality

Let our original problem as follows

minxf(x)\min_x f(x)

subject to\text{subject to}

gi(x)bi,ig_i(x) \ge b_i, \forall i

One way to make this inequality constraint to equality constraint is using a penalty function.

minx[f(x)i=1mPi(gi(x)bi)]\min_x \bigg[f(x) - \sum_{i = 1}^m P_i(g_i(x) - b_i)\bigg]

where

Pi(x)={if x<00otherwiseP_i(x) = \begin{cases}\infty \quad \text{if } x <0 \\ 0 \quad \text{otherwise}\end{cases}

However, it has a problem because it is non-differentiable at x=0x = 0. This problem can’t be fixed even though we change \infty to some large value. Therefore, we take a linear function as a penalty function.

Pi(x)=λixP_i(x) = \lambda_i x

where λi0\lambda_i \ge 0

It is quite natural. If we choose xx that doesn’t satisfy the constraints, it rewards. On the other hand, if we choose xx that doesn’t satisfy the constraints, it penalize.

💡
The amount of rewards or penalize is related to the norm of the xx.

The problem is that it could change the optimal value. If we take the maximum of λi\lambda_i, we can act as a original penalty function. Therefore, the original function can be express as

minx[f(x)i=1mmaxλi0[gi(x)bi]]=minxmaxλi0[f(x)i=1m[gi(x)bi]]\min_x \bigg [ f(x) - \sum_{i = 1}^m \max_{\lambda_i \ge 0}\big[g_i(x) - b_i\big]\bigg] \\ = \min_x\max_{\lambda_i \ge 0} \bigg [ f(x) - \sum_{i = 1}^m \big[g_i(x) - b_i\big]\bigg]

Therefore, it is actually a min-max problem which means the order is very important.

But, what if we change the order like this?

maxλi0minx[f(x)i=1m[gi(x)bi]]\max_{\lambda_i \ge 0}\min_x\bigg [ f(x) - \sum_{i = 1}^m \big[g_i(x) - b_i\big]\bigg]

In this case, min player is much simpler since it is just a unconstraint problem.

💡
Moreover, if our objective function and inequality constraints are all convex, it can make this problem more simpler. Just take the gradient and find xx makes it 0.

This changed order problem is called dual problem , and the original problem called primal problem.

Duality

Let’s consider a following general optimization problem

minxf(x)\min_{x}f(x)

subject to\text{subject to}

hi(x)=0,i=1,gj(x)0,j=1,h_i(x) = 0, i = 1, \dots \\ g_j(x) \ge 0, j = 1, \dots

Similar to the penalty function, we multiply

λihi(x),i=1,μjgj(x),j=1,\lambda_i h_i(x), i = 1, \dots \\ \mu_j g_j(x), j = 1, \dots

where μj0\mu_j \ge 0

and move these constraints to the objective function

minxf(x)maxλi,μj0[iλihi(x)+jμjgi(x)]=minxmaxλi,μj0[f(x)iλihi(x)jμjgi(x)]\min_x f(x) - \max_{\lambda_i, \mu_j \ge 0}\bigg[\sum_{i}\lambda_ih_i(x) + \sum_j \mu_j g_i(x)\bigg] \\ = \min_x\max_{\lambda_i, \mu_j \ge 0} \bigg[f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)\bigg]
💡
Note that there is no sign restriction on λi\lambda_i

To make them as a dual problem, just switch the order of min and max

maxλi,μj0minx[f(x)iλihi(x)jμjgi(x)]\max_{\lambda_i, \mu_j \ge 0}\min_x \bigg[f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)\bigg]

Note that

f(x)iλihi(x)jμjgi(x)f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)

is called Lagrangian of the optimization problem. And

maxλi,μj0minx[f(x)iλihi(x)jμjgi(x)]\max_{\lambda_i, \mu_j \ge 0}\min_x \bigg[f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)\bigg]

is called Lagrangian dual optimization problem.

So, why it is important? Since, it actually gives the lower-bound of the primal problem

Take arbitrary x,λ,μx, \lambda, \mu

f(x)f(x)iλihi(x)jμjgi(x)f(x) \ge f(x) - \sum_i \lambda_i h_i(x) - \sum_j \mu_jg_i(x)

since

hi(x)0,λi0iλihi(x)>0gi(x)=0jμigj(x)=0h_i(x) \ge 0, \lambda_i \ge 0 \Rightarrow \sum_i \lambda_i h_i(x) > 0 \\ g_i(x) = 0 \Rightarrow \sum_j \mu_i g_j(x) = 0

In addition,

f(x)iλihi(x)jμjgi(x)minx[f(x)iλihi(x)jμjgi(x)]f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x) \ge \min_x \bigg[f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)\bigg]

That means

λ,μ,f(x)minx[f(x)iλihi(x)jμjgi(x)]\forall \lambda, \mu, f(x) \ge \min_x \bigg[f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)\bigg]

Therefore,

f(x)maxλi,μj0minx[f(x)iλihi(x)jμjgi(x)]f(x) \ge \max_{\lambda_i, \mu_j \ge 0}\min_x \bigg[f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)\bigg]
💡
Mathematically, max and min have to switch to sup and inf respectively.

Actually, under some assumption, strong duality holds. This means that the optimal value of the primal solution and dual solution coincide.

Progress of iterative optimization: (a) gradual minimization of the primal function and maximization of dual function and (b) the primal optimal and dual optimal reach each other and become equal if strong duality holds.

Slater’s condition

For a convex optimization problem in the form

minxf(x)\min_x f(x)

subject to\text{subject to}

yi(x)0,iAx=by_i(x) \ge 0, \forall i \\ Ax = b

we have strong duality if is is strictly feasible, i.e.

xint(D)\exist x \in int(D) such that

yi(x)<0,iAx=by_i(x) < 0, \forall i \\ Ax = b

In other words, for at least one point in the interior of the domain (not on the boundary of domain), all the inequality constraints holds strictly.

💡
Slater’s condition을 만족하면 strong duality가 성립하므로 primal solution과 dual solution이 정확하게 일치하게 된다.

KKT condition

  1. Stationarity condition
    xL(x,λ,μ)=f(x)iλigi(x)jμihi(x)=0\nabla_x \mathcal L(x^*,\lambda, \mu) = \nabla f(x^*) - \sum_i \lambda_i \nabla g_i(x^*) - \sum_j \mu_i \nabla h_i(x^*) = 0
    💡
    Note that this derivative holds for all dual variables and not just for the optimal dual variables.
  1. Primal feasibility
    gi(x)0,ihi(x)=0,ig_i(x^*) \ge 0, \forall i \\ h_i(x^*) = 0, \forall i
  1. Dual feasibility
    λi0,i\lambda_i \ge 0, \forall i
  1. Complementary slackness
    λigi(x)=0,i\lambda_i^* g_i(x^*) = 0, \forall i

In addition, second order optimality condition is as follows

yTxx2L(x,λ,μ)y0,yTxSy^T\nabla_{xx}^2\mathcal L(x^*, \lambda, \mu) y \ge 0, \forall y \in T_{x^*}S

where SS is determined by active constraints

Note

앞쪽에서 다뤘던 “ZT2f(x)ZZ^T \nabla^2f(x^*) Z is positive semi-definite”는 잘 생각해보면

yTxx2L(x,λ,μ)y0,yN(A)y^T\nabla_{xx}^2\mathcal L(x^*, \lambda, \mu)y \ge 0, \forall y \in \mathcal N(A)

임을 쉽게 알 수 있다. (이때, xx2gi(x)=0\nabla_{xx}^2 g_i(x) = 0 이다.)

이때, AA의 row가 constraint들의 gradient에 대응된다는 점을 감안했을 때, N(A)\mathcal N(A)는 tangent space에 대응된다.

💡
기존의 unconstraint optimization problem에서는 yy를 모든 영역에서 뽑았다면, constraint optimization problem에서는 tangent space에 들어있는 yy만 고려해주면 된다.

Conclusion

  1. Convex optimization problem

    KKT condition is a necessary condition to get a optimal point.

    If strong duality holds (Slater’s condition holds), KKT condition is a sufficient and necessary condition to get a optimal point.

  1. Non-convex optimization problem

    KKT condition is not a necessary condition to get a optimal point. That means it is possible that some points are not a optimal point even though it satisfy the KKT condition.

Non-linear Constraints

minf(x)\min f(x)

subject to\text{subject to}

gi(x)=0,i=Ehi(x)0,i=Ig_i(x) = 0, i = \mathcal E \\ h_i(x) \ge 0, i = \mathcal I

The optimality conditions remains the same. However the derivation is more complicated, in particular we may have to consider feasible curves, not just feasible directions.

Necessary Conditions for Equality Constraints

minf(x)\min f(x)

subject to\text{subject to}

gi(x)=0,i=1,,mg_i(x) = 0, i = 1, \dots, m

If xx^* is a local optimum and Z(x)Z(x^*) is a null-space matrix for the Jacobian at xx^*, then there exists Lagrange multipliers λ\lambda such that

  1. First order
    xL(x,λ)=0\nabla_x \mathcal L(x^*, \lambda^*) = 0
  1. Second order
    Z(x)Txx2L(x,λ)Z(x)Z(x^*)^T\nabla_{xx}^2\mathcal L(x^*, \lambda^*) Z(x^*)

    is positive semi-definite

    Actually, it is equivalent to

    yTxx2L(x,λ)y,yTxSy^T\nabla_{xx}^2\mathcal L(x^*, \lambda^*) y, y \in T_{x^*}S
    💡
    Jacobian을 잘 생각해보면 각 row가 gradient vector에 해당한다. 즉 이러한 행렬의 null space이 속하는 원소는 결국 해당 gradient vector들과 전부 수직인 vector이고 다시 말해서 tangent space 상에 놓인 vector들이다.
반응형
Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.