9. Optimality Conditions for Constrained Optimization
-
728x90
반응형
Introduction
In constrained optimization problem, the constraints can be
Linear or non-linear
Equalities or inequalities
Generally, our objective function and constraints can be represented as follows
minf(x)
subject to
gi(x)=0,i=Ehi(x)≥0,i=I
Therefore, it is much harder than unconstrained optimization.
In addition, we have to change the definition of local optimum slightly as follows:
x∗ is a local minimum if
f(x∗)≤f(x),∀x∈Bϵ(x∗)
where gi(x)=0,i=E and hi(x)≥0,i=I for some ϵ>0.
💡
기존의 local minimum의 정의에서 feasible region만 관찰하는 것으로 조건이 바뀐 것이라고 이해하면 된다.
Optimality condition
Boundary point : It is optimal on the boundary if no feasible and descent direction with respect to the active constraints.
→ We define p to be a feasible direction at the point x if there exists some ϵ>0 such that x+αp∈S,∀0≤α≤ϵ
However, for non-linear equality, it could be feasible curves rather than feasible directions
Linear Equality Constraints
Null space Approach
Note that for all linear equality constraints can be converted into the following form:
minf(x)
subject to
Ax=b
Consider the general solution update concept. For feasibility, the search direction must be in the null space of A. (Assume, we know at least one feasible point)
💡
Feasible region quotient space 기준으로 동일한 equivalence class에 속한다는 사실을 이용하겠다는 의미이다. 즉 업데이트가 가능한 방향은 kernel에 속한 원소만큼만 가능하다.
Reformulation
Assume we already know a feasible solution x, i.e.
Ax=b
Take p∈N(A), then
A(x+p)=b
By using this fact, we can get rid of the constraints
vminϕ(v)=f(x+Zv)
where columns in Z is the basis of N(A)
Therefore, the optimality conditions of unconstrained optimization for function ϕ with respect to v are as follows
∇ϕ(v)=ZT∇f(x)=0
∇2ϕ(v)=ZT∇2f(x)Z is positive semi-definite.
💡
It is also known as reduced gradient and reduced Hessian respectively.
Lagrange multiplier Approach
If x∗ is a local optimum, then there exists λ∗ such that
∇f(x∗)=ATλ∗
In other words, the gradient is a linear combination of the rows of A
λ are known as Lagrange multipliers
How can we derive above formula?
minf(x)
subject to
Ax=b
Define Lagrangian function such that
L(x,λ)=f(x)−λT(Ax−b)
A stationary point (x∗,λ∗) of the function has ∇L(x∗,λ∗)=0
💡
A local optimum x of the original problem must together with some λ be a stationary point of the Lagrangian function.
💡
즉 다시 말해서 stationary point가 되는 x∗가 original problem과 Lagrangian function이 같아진다는 것을 이용해서 원래 original function이 아닌 Lagrangian function으로 바꿔서 non-constraint problem으로 돌려서 풀겠다는 의미이다.
💡
추가적으로 Lagrangian term 앞에 -를 붙인 이유는 나중에 inequality constraint를 ≥로 사용하기 위함이다. 만약 ≤로 사용하고 싶으면 + 를 사용해주면 된다.
Therefore, first-order condition for Lagrangian function is as follows
∇f(x)−ATλ=0Ax−b=0
💡
잘 생각해보면 직관적으로 이해할 수 있다. A의 row들이 결국 constraint의 gradient와 같으므로 ∇f(x)가 해당 gradient들과 나란하다는 의미이다.
Linear Inequality Constraints
Introduction
minf(x)
subject to
Ax≥b
An interior point : optimality conditions for unconstrained optimization
A boundary point : for which some constraints are active
Note : The feasible region is a polyhedron/polytope since its constraints are just a linear function.
If we know which constraints are active at optimum, then we can solve an equality problem.
💡
The non-active constraint is irrelevant to local optimality.
Necessary Conditions
Introduction
minf(x)
subject to
Ax≥b
Given a solution x∗, let A^ be the sub-matrix of rows of active constraints. Denote by b^ the corresponding right-hand side.
A local optimum must also be locally optimum to the restricted equality problem with the active constraints
Necessary Conditions
minf(x)
subject to
A^x≥b^
∃λ∗≥0,∇f(x∗)=A^Tλ∗ with λ∗≥0
💡
At a local optimum the Lagrangian multipliers are non-negative
💡
Ax−b가 양수가 되는 것에 대한 penalty를 준다고 이해해도 되고, A^Tλ∗ 방향은 interior 방향이다. 즉, ∇f(x∗)가 해당 방향과 orientation이 같으면서 평행하다는 것의 의미는 interior 방향으로 이동하게 되면 함숫값이 증가한다는 것을 의미한다. 이는 x∗가 local minima라는 의미와 부합한다.
Complementary Slackness
(λ∗)T(Ax∗−b)=0
By the condition above, we don’t need to explicitly split the constraints in to active and non-active ones. We just make its values to the non-negative.
💡
The multiplier of each inactive constraint must be a zero Lagrangian multiplier.
Therefore, we can write the condition as
∃λ∗≥0,∇f(x∗)=ATλ∗
For an inactive row i in A, λi=0 by complementary slackness.
💡
즉 다시 말해서 active constraint와 non-active constraint를 나누지 말고 Lagrangian multipler가 양수가 되게끔만 제약을 거는 것이다. 또한 inactive constraint의 경우에는 Lagrangian multiplier가 0이 되게끔 하면 된다.
Putting the Pieces Together
If x∗ is a local optimum, there exists a vector λ∗ such that
First order condition
∇f(x∗)=ATλ∗ (this is equivalent to ZT∇f(x∗)=0)
λ∗≥0 (Dual feasibility)
λ∗(Ax∗−b)=0 (Complementary slackness conditions)
Second order condition
ZT∇2f(x∗)Z is positive semi-definite
where Z is a null-space matrix for the active constraints.
💡
For linear equality, the conditions are the fist bullet and the last bullet for all constraints. Also there is no sign restriction on λ
💡
We can obtain the condition for linear equality by writing it as two sets of inequalities and apply the above.
Sufficient Conditions
First order condition
Ax∗≥b
∇f(x∗)=ATλ∗ (this is equivalent to ZT∇f(x∗)=0)
λ∗≥0 (Dual feasibility)
strict complementary slackness
→ Complementary slackness condition에서 λi∗=0 이거나 aiTx∗−bi=0 둘 중 하나만 해당하는 경우를 strict complementary slackness라고 부른다.
Second order condition
ZT∇2f(x∗)Z is positive definite
where Z is a null-space matrix for the active constraints.
Interpretation of KKT condition and Duality
Let our original problem as follows
xminf(x)
subject to
gi(x)≥bi,∀i
One way to make this inequality constraint to equality constraint is using a penalty function.
xmin[f(x)−i=1∑mPi(gi(x)−bi)]
where
Pi(x)={∞if x<00otherwise
However, it has a problem because it is non-differentiable at x=0. This problem can’t be fixed even though we change ∞ to some large value. Therefore, we take a linear function as a penalty function.
Pi(x)=λix
where λi≥0
It is quite natural. If we choose x that doesn’t satisfy the constraints, it rewards. On the other hand, if we choose x that doesn’t satisfy the constraints, it penalize.
💡
The amount of rewards or penalize is related to the norm of the x.
The problem is that it could change the optimal value. If we take the maximum of λi, we can act as a original penalty function. Therefore, the original function can be express as
Therefore, it is actually a min-max problem which means the order is very important.
But, what if we change the order like this?
λi≥0maxxmin[f(x)−i=1∑m[gi(x)−bi]]
In this case, min player is much simpler since it is just a unconstraint problem.
💡
Moreover, if our objective function and inequality constraints are all convex, it can make this problem more simpler. Just take the gradient and find x makes it 0.
This changed order problem is called dual problem , and the original problem called primal problem.
Duality
Let’s consider a following general optimization problem
xminf(x)
subject to
hi(x)=0,i=1,…gj(x)≥0,j=1,…
Similar to the penalty function, we multiply
λihi(x),i=1,…μjgj(x),j=1,…
where μj≥0
and move these constraints to the objective function
Mathematically, max and min have to switch to sup and inf respectively.
Actually, under some assumption, strong duality holds. This means that the optimal value of the primal solution and dual solution coincide.
Progress of iterative optimization: (a) gradual minimization of the primal function and maximization of dual function and (b) the primal optimal and dual optimal reach each other and become equal if strong duality holds.
Slater’s condition
For a convex optimization problem in the form
xminf(x)
subject to
yi(x)≥0,∀iAx=b
we have strong duality if is is strictly feasible, i.e.
∃x∈int(D) such that
yi(x)<0,∀iAx=b
In other words, for at least one point in the interior of the domain (not on the boundary of domain), all the inequality constraints holds strictly.
💡
Slater’s condition을 만족하면 strong duality가 성립하므로 primal solution과 dual solution이 정확하게 일치하게 된다.
Note that this derivative holds for all dual variables and not just for the optimal dual variables.
Primal feasibility
gi(x∗)≥0,∀ihi(x∗)=0,∀i
Dual feasibility
λi≥0,∀i
Complementary slackness
λi∗gi(x∗)=0,∀i
In addition, second order optimality condition is as follows
yT∇xx2L(x∗,λ,μ)y≥0,∀y∈Tx∗S
where S is determined by active constraints
Note
앞쪽에서 다뤘던 “ZT∇2f(x∗)Z is positive semi-definite”는 잘 생각해보면
yT∇xx2L(x∗,λ,μ)y≥0,∀y∈N(A)
임을 쉽게 알 수 있다. (이때, ∇xx2gi(x)=0 이다.)
이때, A의 row가 constraint들의 gradient에 대응된다는 점을 감안했을 때, N(A)는 tangent space에 대응된다.
💡
기존의 unconstraint optimization problem에서는 y를 모든 영역에서 뽑았다면, constraint optimization problem에서는 tangent space에 들어있는 y만 고려해주면 된다.
Conclusion
Convex optimization problem
KKT condition is a necessary condition to get a optimal point.
If strong duality holds (Slater’s condition holds), KKT condition is a sufficient and necessary condition to get a optimal point.
Non-convex optimization problem
KKT condition is not a necessary condition to get a optimal point. That means it is possible that some points are not a optimal point even though it satisfy the KKT condition.
Non-linear Constraints
minf(x)
subject to
gi(x)=0,i=Ehi(x)≥0,i=I
The optimality conditions remains the same. However the derivation is more complicated, in particular we may have to consider feasible curves, not just feasible directions.
Necessary Conditions for Equality Constraints
minf(x)
subject to
gi(x)=0,i=1,…,m
If x∗ is a local optimum and Z(x∗) is a null-space matrix for the Jacobian at x∗, then there exists Lagrange multipliers λ such that
First order
∇xL(x∗,λ∗)=0
Second order
Z(x∗)T∇xx2L(x∗,λ∗)Z(x∗)
is positive semi-definite
Actually, it is equivalent to
yT∇xx2L(x∗,λ∗)y,y∈Tx∗S
💡
Jacobian을 잘 생각해보면 각 row가 gradient vector에 해당한다. 즉 이러한 행렬의 null space이 속하는 원소는 결국 해당 gradient vector들과 전부 수직인 vector이고 다시 말해서 tangent space 상에 놓인 vector들이다.