Computer Science/Optimization

10. Feasible-Point Methods

  • -
728x90
반응형

Introduction

In this article, we examine methods that solve constrained optimization problems by attempting to remain feasible at every iteration. If all the constraints are linear, maintaining feasibility is straightforward. However, when non-linear constraints are present, then more elaborate procedures are required. Although it strives to maintain feasibility at every iteration, they do not always achieve this.

Active-set Methods

minf(x)\min f(x)

subject to\text{subject to}

AxbAx \ge b
  • Bad news : It can’t be reformulated into unconstrained optimization.
  • Good news : We have methods for equality constraints.

How can we resolve this issue? A brute force approach would be to solve the equality-constrained problem for all possible selections of active constraints, and then choose the best solution. However, this approach can’t be applicable even though the number of constraints are quite small.

Active-set methods attempt to overcome this difficulty by moving sequentially from one choice of active constraints to another choice that is guaranteed to produce at least as good a solution.

The most commonly used active-set methods are feasible-point methods. An initial feasible point, if none is provided, can be obtained by using linear-programming. At each iteration of the active-set method, we select a working set of constraints that are assumed to be active at the optimum.

💡
즉, minimizing ff subject to the constraints를 equality constraint with respect to the working set으로 돌려서 풀겠다는 의미이다.
💡
Note that these are linear inequality constraints.
💡
In general, the working set at the current point is a subset of the constraints that are active at xx, so that xx is a feasible point for the working set. There may also be constraints that are active at xx but that are not included in the working set. Hence the working set is not necessarily equal to the active set.

Algorithm

  • Initialize with some working set WW of active constraints
  • Iterate
    1. Optimality check
      • If no constraints are active and the current point is an unconstrained stationary point, then stop
      • If there are active constraints and the current point is optimal with respect to the active constraints, compute the Lagrange multipliers
      • If the multipliers are non-negative, stop. Otherwise, drop a constraint with negative multiplier from WW.
        💡
        즉, multiplier가 전부 다 양수라는 의미는 gradient 반대 방향이 interior로 절대로 들어오지 않는다는 의미이다. (constraint를 \ge로 잡은 상태이므로 이렇게 해석할 수 있다.). 이 의미는 active constraint에 대해서 optimal이고 interior point로 들어가는 순간 함숫값이 커진다는 의미이므로 local optimum이라고 단정할 수 있는 것이다.
    1. Search direction
      • Determine a descent direction pp that is feasible with respect to WW
    1. Step length
      • Determine a satisfying f(xk+αp)<f(xk)f(x_k+\alpha p) < f(x_k) and ααˉ\alpha \le\bar \alpha where αˉ\bar \alpha is the maximum possible step length along pp
    1. Update
      • xk+1=xk+αpx_{k + 1} = x_k + \alpha p. If new constraint boundary encountered, add to WW.
💡
요약하면 working set만을 고려하는 unconstraint optimization problem을 푸는 것처럼 풀고, 나머지 non-working set까지 고려해서 condition이 violated되었는지를 판단하는 것이다.

Example

minx12(x13)2+(x22)2\min_x \frac{1}{2}(x_1 - 3)^2 + (x_2 - 2)^2

subject to\text{subject to}

2x1x20x1x24x20\begin{align}2x_1 -x_2 \ge 0 \\ -x_1 - x_2 \ge -4 \\ x_2 \ge 0\end{align}

when starting at x1=0x_1 = 0 and x2=0x_2 = 0

  1. Active constraints : 1, 3
  1. Find λ1,λ2\lambda_1, \lambda_2 such that
    f(0,0)=λ1g1(0,0)+λ3g3(0,0)[34]=λ1[21]+λ2[01]λ1=32,λ2=112\nabla f(0, 0) = \lambda_1\nabla g_1(0, 0) + \lambda_3 \nabla g_3(0, 0) \\ \Rightarrow \begin{bmatrix}-3\\-4\end{bmatrix} = \lambda_1 \begin{bmatrix}2 \\ -1\end{bmatrix} + \lambda_2 \begin{bmatrix}0 \\ 1\end{bmatrix}\\ \Rightarrow \lambda_1 = -\frac{3}{2}, \lambda_2 = -\frac{11}{2}
  1. Since λ1,λ2<0\lambda_1, \lambda_2 < 0, the current point is not a optimal point.
  1. Drop x20x_2 \ge 0 constraint:
    minx12(x13)2+(x22)2\min_x \frac{1}{2}(x_1 - 3)^2 + (x_2 - 2)^2

    subject to\text{subject to}

    2x1x2=02x_1 -x_2 = 0
  1. Find the null space
    A=[21]A = \begin{bmatrix}2 -1\end{bmatrix}

    Therefore,

    N(A)=sp{(1,2)}\mathcal N(A) = \text{sp}\{(1, 2)\}
    💡
    Null space 방향을 search direction으로 설정
  1. Take
    (x1,x2)=(0,0)+δ(1,2)(x_1, x_2) = (0, 0) + \delta(1, 2)
  1. Solve
    minδ12(δ3)2+(2δ2)2δ=119x1=119,x2=229\min_{\delta}\frac{1}{2}(\delta - 3)^2 + (2\delta-2)^2 \\ \Rightarrow \delta = \frac{11}{9} \\ \Rightarrow x_1 = \frac{11}{9}, x_2 = \frac{22}{9}

Since, x1,x2x_1, x_2 doesn’t violate the other constraints, it is in the feasible reasion.

  1. Check the optimality condition again
    f(119,229)=λ1g1(119,229)[16989]=λ1[21]λ1=89\nabla f(\frac{11}{9}, \frac{22}{9}) = \lambda_1\nabla g_1(\frac{11}{9}, \frac{22}{9}) \\ \begin{bmatrix}-\frac{16}{9}\\\frac{8}{9}\end{bmatrix}= \lambda_1 \begin{bmatrix}2 \\ -1\end{bmatrix} \\ \Rightarrow \lambda_1 = -\frac{8}{9}

    → It is also not a optimal point. So we also drop this constraint also

  1. We are now left with no active constraints, which means that the problem is locally unconstrained. Therefore, reduced gradient is simply the unconstrained Newton direction
    p=2f(x1)1f(x1)p = - \nabla^2f(x_1)^{-1}\nabla f(x_1)
  1. Calculate the largest feasible step by considering the given constraints
  1. Iterate above procedures
Illustration of active-set algorithm

Sequential Quadratic Programming (SQP)

Sequential quadratic programming is a popular and successful technique for solving non-linearly constrained problems such as

minf(x)\min f(x)

subject to\text{subject to}

g(x)=0g(x) = 0
💡
Active-set method의 경우, linear constraint였음을 잘 기억해야 한다.

The main idea is to obtain a search direction by solving a quadratic program, that is a problem with a quadratic objective function and linear constraints.

💡
This approach is a generalization of Newton’s method for unconstrained minimization

Idea

The Lagrangian for this problem is

L(x,λ)=f(x)λTg(x)\mathcal L(x, \lambda) = f(x) - \lambda ^Tg(x)

and the first-order optimality condition is

L(x,λ)=0\nabla \mathcal L(x, \lambda) = 0

Then the formula for Newton’s method is

[xk+1λk+1]=[xkλk]+[pkvk]\begin{bmatrix} x_{k + 1} \\ \lambda_{k + 1} \end{bmatrix} = \begin{bmatrix} x_{k} \\ \lambda_{k} \end{bmatrix} + \begin{bmatrix} p_k \\ v_k \end{bmatrix}

where pkp_k and vkv_k are obtained as the solution to the linear system

2L(xk,λk)[pkvk]=L(xk,λk)[xx2L(xk,λk)g(xk)Tg(xk)0][pkvk]=[xL(xk,λk)g(xk)]\nabla^2 \mathcal L(x_k , \lambda_k)\begin{bmatrix} p_{k} \\ v_{k} \end{bmatrix} = -\nabla \mathcal L(x_k, \lambda_k) \\ \Rightarrow \begin{bmatrix}\nabla^2_{xx}\mathcal L(x_k, \lambda_k) & -\nabla g(x_k)^T \\ -\nabla g(x_k) & 0\end{bmatrix}\begin{bmatrix} p_{k} \\ v_{k} \end{bmatrix} = \begin{bmatrix}-\nabla_x\mathcal L(x_k, \lambda_k) \\ g(x_k)\end{bmatrix}
💡
This derivation is quite intuitive. If we think the Lagrangian for this problem as a unconstrained optimization problem.

Consider the following the constrained minimization problem where (xk,λk)(x_k, \lambda_k) is our Newton iteration at step kk

minpk12pkTxx2L(xk,λk)pk+pkTLx(xk,λk)\min_{p_k} \frac{1}{2}p_k^T\nabla^2_{xx}\mathcal L(x_k, \lambda_k)p_k + p_k^T \mathcal \nabla L_x(x_k, \lambda_k)

subject to\text{subject to}

g(xk)Tpk+g(xk)=0\nabla g(x_k)^Tp_k + g(x_k) = 0
💡
Note that the objective function is quadratic in pp

This minimization problem has Lagrangian

Lk(pk,vk)=12pkTxx2L(xk,λk)pk+pkTLx(xk,λk)δkT(g(xk)Tpk+g(xk))\mathcal L_k(p_k, v_k) = \frac{1}{2}p_k^T\nabla^2_{xx}\mathcal L(x_k, \lambda_k)p_k + p_k^T \mathcal L_x(x_k, \lambda_k) - \delta_k^T(\nabla g(x_k)^Tp_k + g(x_k))

Note that the first optimality condition for above Lagrangian is

xx2L(xk,λk)pkg(xk)=xL(xk,λk)g(xk)Tpk=g(xk)\begin{align}\nabla^2_{xx}\mathcal L(x_k, \lambda_k)p_k - \nabla g(x_k) = -\nabla_x \mathcal L(x_k, \lambda_k) \\ -\nabla g(x_k)^Tp_k = g(x_k)\end{align}

which are equivalent to the original first-optimality condition. Then solving Lk(p,vk)=0\nabla \mathcal L_k(p, v_k) = 0 gives a linear system, which is the same as the kk’th Newton step of the original problem.

The quadratic function is a Taylor series approximation to the Lagrangian at (xk,λk)(x_k, \lambda_k), and the constraints are a linear approximation to g(xk+pk)=0g(x_k + p_k) = 0

💡
In the constrained case, the formulas for Newton’s method correspond to the constrained minimization of a quadratic approximation to the Lagrangian.

Issue

SQP is an important method, and there are many issues to be considered to obtain an efficient and reliable implementation

  1. Efficient solution of the lienar system at each Newton iteration

    → block matrix structure can be exploited

  1. Quasi-Newton approximations to the xx2L(xk,λk)\nabla^2_{xx}\mathcal L(x_k, \lambda_k) as in the unconstrained case
  1. Trust region, line search to improve robustness
  1. Treatment of constraints (equality and inequality) during the iterative process
  1. Selection of good starting guess for λ\lambda

반응형
Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.