Computer Science/Optimization

9. Optimality Conditions for Constrained Optimization

728x90

Introduction

In constrained optimization problem, the constraints can be

Linear or non-linear

Equalities or inequalities

Generally, our objective function and constraints can be represented as follows

\min f(x)

$\text{subject to}$

g_i(x) = 0, i = \mathcal E \\ h_i(x) \ge 0, i = \mathcal I

Therefore, it is much harder than unconstrained optimization.

In addition, we have to change the definition of local optimum slightly as follows:

$x^*$ is a local minimum if

f(x^*) \le f(x), \forall x \in B_\epsilon(x^*)

where $g_i(x) = 0, i = \mathcal E$ and $h_i(x) \ge 0, i = \mathcal I$ for some $\epsilon > 0$ .

💡

기존의 local minimum의 정의에서 feasible region만 관찰하는 것으로 조건이 바뀐 것이라고 이해하면 된다.

Optimality condition

Boundary point : It is optimal on the boundary if no feasible and descent direction with respect to the active constraints.
→ We define $p$ to be a feasible direction at the point $x$ if there exists some $\epsilon > 0$ such that $x + \alpha p \in S, \forall \text{ }0 \le \alpha \le \epsilon$

However, for non-linear equality, it could be feasible curves rather than feasible directions

Linear Equality Constraints

Null space Approach

Note that for all linear equality constraints can be converted into the following form:

\min f(x)

$\text{subject to}$

Ax = b

Consider the general solution update concept. For feasibility, the search direction must be in the null space of $A$ . (Assume, we know at least one feasible point)

💡

Feasible region quotient space 기준으로 동일한 equivalence class에 속한다는 사실을 이용하겠다는 의미이다. 즉 업데이트가 가능한 방향은 kernel에 속한 원소만큼만 가능하다.

Reformulation

Assume we already know a feasible solution $\overline x$ , i.e.

A\overline x = b

Take $p\in \mathcal N(A)$ , then

A(\overline x + p) = b

By using this fact, we can get rid of the constraints

\min_v \phi(v) = f(\overline x + Zv)

where columns in $Z$ is the basis of $\mathcal N(A)$

Therefore, the optimality conditions of unconstrained optimization for function $\phi$ with respect to $v$ are as follows

$\nabla \phi(v) = Z^T\nabla f(x) = 0$

$\nabla^2 \phi(v) = Z^T\nabla^2f(x) Z$ is positive semi-definite.

💡

It is also known as reduced gradient and reduced Hessian respectively.

Lagrange multiplier Approach

If $x^*$ is a local optimum, then there exists $\lambda^*$ such that

\nabla f(x^*) = A^T\lambda^*

In other words, the gradient is a linear combination of the rows of $A$

$\lambda$ are known as Lagrange multipliers

How can we derive above formula?

\min f(x)

$\text{subject to}$

Ax = b

Define Lagrangian function such that

\mathcal L(x, \lambda) = f(x) - \lambda^T (Ax-b)

A stationary point $(x^*, \lambda^*)$ of the function has $\nabla \mathcal L(x^*, \lambda^*) = 0$

💡

A local optimum

x

of the original problem must together with some

\lambda

be a stationary point of the Lagrangian function.

💡

즉 다시 말해서 stationary point가 되는

x^*

가 original problem과 Lagrangian function이 같아진다는 것을 이용해서 원래 original function이 아닌 Lagrangian function으로 바꿔서 non-constraint problem으로 돌려서 풀겠다는 의미이다.

💡

추가적으로 Lagrangian term 앞에 -를 붙인 이유는 나중에 inequality constraint를

\ge

로 사용하기 위함이다. 만약

\le

로 사용하고 싶으면 + 를 사용해주면 된다.

Therefore, first-order condition for Lagrangian function is as follows

\nabla f(x) - A^T\lambda = 0 \\ Ax - b = 0

💡

잘 생각해보면 직관적으로 이해할 수 있다.

A

의 row들이 결국 constraint의 gradient와 같으므로

\nabla f(x)

가 해당 gradient들과 나란하다는 의미이다.

Linear Inequality Constraints

Introduction

\min f(x)

$\text{subject to}$

Ax \ge b

An interior point : optimality conditions for unconstrained optimization

A boundary point : for which some constraints are active

Note : The feasible region is a polyhedron/polytope since its constraints are just a linear function.

If we know which constraints are active at optimum, then we can solve an equality problem.

💡

The non-active constraint is irrelevant to local optimality.

Necessary Conditions

Introduction

\min f(x)

$\text{subject to}$

Ax \ge b

Given a solution $x^*$ , let $\hat A$ be the sub-matrix of rows of active constraints. Denote by $\hat b$ the corresponding right-hand side.

A local optimum must also be locally optimum to the restricted equality problem with the active constraints

Necessary Conditions

\min f(x)

$\text{subject to}$

\hat Ax \ge \hat b

\exist \lambda^* \ge 0, \nabla f(x^*) = \hat A^T\lambda^* \text{ with }\lambda^* \ge 0

💡

At a local optimum the Lagrangian multipliers are non-negative

💡

Ax - b

가 양수가 되는 것에 대한 penalty를 준다고 이해해도 되고,

\hat A^T \lambda^*

방향은 interior 방향이다. 즉,

\nabla f(x^*)

가 해당 방향과 orientation이 같으면서 평행하다는 것의 의미는 interior 방향으로 이동하게 되면 함숫값이 증가한다는 것을 의미한다. 이는

x^*

가 local minima라는 의미와 부합한다.

Complementary Slackness

(\lambda^*)^T(Ax^* - b) = 0

By the condition above, we don’t need to explicitly split the constraints in to active and non-active ones. We just make its values to the non-negative.

💡

The multiplier of each inactive constraint must be a zero Lagrangian multiplier.

Therefore, we can write the condition as

\exists \lambda^* \ge 0, \nabla f(x^*) = A^T \lambda^*

For an inactive row $i$ in $A$ , $\lambda_i = 0$ by complementary slackness.

💡

즉 다시 말해서 active constraint와 non-active constraint를 나누지 말고 Lagrangian multipler가 양수가 되게끔만 제약을 거는 것이다. 또한 inactive constraint의 경우에는 Lagrangian multiplier가 0이 되게끔 하면 된다.

Putting the Pieces Together

If $x^*$ is a local optimum, there exists a vector $\lambda^*$ such that

First order condition
1. $\nabla f(x^*) = A^T \lambda^*$ (this is equivalent to $Z^T \nabla f(x^*) = 0$ )
1. $\lambda^* \ge 0$ (Dual feasibility)
1. $\lambda^*(Ax^* - b) = 0$ (Complementary slackness conditions)

Second order condition
1. $Z^T \nabla^2f(x^*) Z$ is positive semi-definite

where $Z$ is a null-space matrix for the active constraints.

💡

For linear equality, the conditions are the fist bullet and the last bullet for all constraints. Also there is no sign restriction on

\lambda

💡

We can obtain the condition for linear equality by writing it as two sets of inequalities and apply the above.

Sufficient Conditions

First order condition
1. $Ax^* \ge b$
1. $\nabla f(x^*) = A^T \lambda^*$ (this is equivalent to $Z^T \nabla f(x^*) = 0$ )
1. $\lambda^* \ge 0$ (Dual feasibility)
1. strict complementary slackness
  → Complementary slackness condition에서 $\lambda^*_i = 0$ 이거나 $a_i^Tx^* - b_i = 0$ 둘 중 하나만 해당하는 경우를 strict complementary slackness라고 부른다.

Second order condition
1. $Z^T \nabla^2f(x^*) Z$ is positive definite

where $Z$ is a null-space matrix for the active constraints.

Interpretation of KKT condition and Duality

Let our original problem as follows

\min_x f(x)

$\text{subject to}$

g_i(x) \ge b_i, \forall i

One way to make this inequality constraint to equality constraint is using a penalty function.

\min_x \bigg[f(x) - \sum_{i = 1}^m P_i(g_i(x) - b_i)\bigg]

where

P_i(x) = \begin{cases}\infty \quad \text{if } x <0 \\ 0 \quad \text{otherwise}\end{cases}

However, it has a problem because it is non-differentiable at $x = 0$ . This problem can’t be fixed even though we change $\infty$ to some large value. Therefore, we take a linear function as a penalty function.

P_i(x) = \lambda_i x

where $\lambda_i \ge 0$

It is quite natural. If we choose $x$ that doesn’t satisfy the constraints, it rewards. On the other hand, if we choose $x$ that doesn’t satisfy the constraints, it penalize.

💡

The amount of rewards or penalize is related to the norm of the

x

The problem is that it could change the optimal value. If we take the maximum of $\lambda_i$ , we can act as a original penalty function. Therefore, the original function can be express as

\min_x \bigg [ f(x) - \sum_{i = 1}^m \max_{\lambda_i \ge 0}\big[g_i(x) - b_i\big]\bigg] \\ = \min_x\max_{\lambda_i \ge 0} \bigg [ f(x) - \sum_{i = 1}^m \big[g_i(x) - b_i\big]\bigg]

Therefore, it is actually a min-max problem which means the order is very important.

But, what if we change the order like this?

\max_{\lambda_i \ge 0}\min_x\bigg [ f(x) - \sum_{i = 1}^m \big[g_i(x) - b_i\big]\bigg]

In this case, min player is much simpler since it is just a unconstraint problem.

💡

Moreover, if our objective function and inequality constraints are all convex, it can make this problem more simpler. Just take the gradient and find

x

makes it 0.

This changed order problem is called dual problem , and the original problem called primal problem.

Duality

Let’s consider a following general optimization problem

\min_{x}f(x)

$\text{subject to}$

h_i(x) = 0, i = 1, \dots \\ g_j(x) \ge 0, j = 1, \dots

Similar to the penalty function, we multiply

\lambda_i h_i(x), i = 1, \dots \\ \mu_j g_j(x), j = 1, \dots

where $\mu_j \ge 0$

and move these constraints to the objective function

\min_x f(x) - \max_{\lambda_i, \mu_j \ge 0}\bigg[\sum_{i}\lambda_ih_i(x) + \sum_j \mu_j g_i(x)\bigg] \\ = \min_x\max_{\lambda_i, \mu_j \ge 0} \bigg[f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)\bigg]

💡

Note that there is no sign restriction on

\lambda_i

To make them as a dual problem, just switch the order of min and max

\max_{\lambda_i, \mu_j \ge 0}\min_x \bigg[f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)\bigg]

Note that

f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)

is called Lagrangian of the optimization problem. And

\max_{\lambda_i, \mu_j \ge 0}\min_x \bigg[f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)\bigg]

is called Lagrangian dual optimization problem.

So, why it is important? Since, it actually gives the lower-bound of the primal problem

Take arbitrary $x, \lambda, \mu$

f(x) \ge f(x) - \sum_i \lambda_i h_i(x) - \sum_j \mu_jg_i(x)

since

h_i(x) \ge 0, \lambda_i \ge 0 \Rightarrow \sum_i \lambda_i h_i(x) > 0 \\ g_i(x) = 0 \Rightarrow \sum_j \mu_i g_j(x) = 0

In addition,

f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x) \ge \min_x \bigg[f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)\bigg]

That means

\forall \lambda, \mu, f(x) \ge \min_x \bigg[f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)\bigg]

Therefore,

f(x) \ge \max_{\lambda_i, \mu_j \ge 0}\min_x \bigg[f(x) -\sum_{i}\lambda_ih_i(x) - \sum_j \mu_j g_i(x)\bigg]

💡

Mathematically, max and min have to switch to sup and inf respectively.

Actually, under some assumption, strong duality holds. This means that the optimal value of the primal solution and dual solution coincide.

Progress of iterative optimization: (a) gradual minimization of the primal function and maximization of dual function and (b) the primal optimal and dual optimal reach each other and become equal if strong duality holds.

Slater’s condition

For a convex optimization problem in the form

\min_x f(x)

$\text{subject to}$

y_i(x) \ge 0, \forall i \\ Ax = b

we have strong duality if is is strictly feasible, i.e.

$\exist x \in int(D)$ such that

y_i(x) < 0, \forall i \\ Ax = b

In other words, for at least one point in the interior of the domain (not on the boundary of domain), all the inequality constraints holds strictly.

💡

Slater’s condition을 만족하면 strong duality가 성립하므로 primal solution과 dual solution이 정확하게 일치하게 된다.

KKT condition

Stationarity condition
$\nabla_x \mathcal L(x^*,\lambda, \mu) = \nabla f(x^*) - \sum_i \lambda_i \nabla g_i(x^*) - \sum_j \mu_i \nabla h_i(x^*) = 0$
💡
Note that this derivative holds for all dual variables and not just for the optimal dual variables.

Primal feasibility
$g_i(x^*) \ge 0, \forall i \\ h_i(x^*) = 0, \forall i$

Dual feasibility
$\lambda_i \ge 0, \forall i$

Complementary slackness
$\lambda_i^* g_i(x^*) = 0, \forall i$

In addition, second order optimality condition is as follows

y^T\nabla_{xx}^2\mathcal L(x^*, \lambda, \mu) y \ge 0, \forall y \in T_{x^*}S

where $S$ is determined by active constraints

Note

앞쪽에서 다뤘던 “ $Z^T \nabla^2f(x^*) Z$ is positive semi-definite”는 잘 생각해보면

y^T\nabla_{xx}^2\mathcal L(x^*, \lambda, \mu)y \ge 0, \forall y \in \mathcal N(A)

임을 쉽게 알 수 있다. (이때, $\nabla_{xx}^2 g_i(x) = 0$ 이다.)

이때, $A$ 의 row가 constraint들의 gradient에 대응된다는 점을 감안했을 때, $\mathcal N(A)$ 는 tangent space에 대응된다.

💡

기존의 unconstraint optimization problem에서는

y

를 모든 영역에서 뽑았다면, constraint optimization problem에서는 tangent space에 들어있는

y

만 고려해주면 된다.

Conclusion

Convex optimization problem
KKT condition is a necessary condition to get a optimal point.
If strong duality holds (Slater’s condition holds), KKT condition is a sufficient and necessary condition to get a optimal point.

Non-convex optimization problem
KKT condition is not a necessary condition to get a optimal point. That means it is possible that some points are not a optimal point even though it satisfy the KKT condition.

Non-linear Constraints

\min f(x)

$\text{subject to}$

g_i(x) = 0, i = \mathcal E \\ h_i(x) \ge 0, i = \mathcal I

The optimality conditions remains the same. However the derivation is more complicated, in particular we may have to consider feasible curves, not just feasible directions.

Necessary Conditions for Equality Constraints

\min f(x)

$\text{subject to}$

g_i(x) = 0, i = 1, \dots, m

If $x^*$ is a local optimum and $Z(x^*)$ is a null-space matrix for the Jacobian at $x^*$ , then there exists Lagrange multipliers $\lambda$ such that

First order
$\nabla_x \mathcal L(x^*, \lambda^*) = 0$

Second order
$Z(x^*)^T\nabla_{xx}^2\mathcal L(x^*, \lambda^*) Z(x^*)$
is positive semi-definite
Actually, it is equivalent to
$y^T\nabla_{xx}^2\mathcal L(x^*, \lambda^*) y, y \in T_{x^*}S$
💡
Jacobian을 잘 생각해보면 각 row가 gradient vector에 해당한다. 즉 이러한 행렬의 null space이 속하는 원소는 결국 해당 gradient vector들과 전부 수직인 vector이고 다시 말해서 tangent space 상에 놓인 vector들이다.

Contents

새소식

인기 검색어

9. Optimality Conditions for Constrained Optimization

Introduction

Optimality condition

Linear Equality Constraints

Null space Approach

Reformulation

Lagrange multiplier Approach

Linear Inequality Constraints

Introduction

Necessary Conditions

Introduction

Necessary Conditions

Complementary Slackness

Putting the Pieces Together

Sufficient Conditions

Interpretation of KKT condition and Duality

Duality

Slater’s condition

KKT condition

Conclusion

Non-linear Constraints

Necessary Conditions for Equality Constraints

당신이 좋아할만한 콘텐츠

티스토리툴바