Computer Science/Optimization

10. Feasible-Point Methods

728x90

Introduction

In this article, we examine methods that solve constrained optimization problems by attempting to remain feasible at every iteration. If all the constraints are linear, maintaining feasibility is straightforward. However, when non-linear constraints are present, then more elaborate procedures are required. Although it strives to maintain feasibility at every iteration, they do not always achieve this.

Active-set Methods

\min f(x)

$\text{subject to}$

Ax \ge b

Bad news : It can’t be reformulated into unconstrained optimization.

Good news : We have methods for equality constraints.

How can we resolve this issue? A brute force approach would be to solve the equality-constrained problem for all possible selections of active constraints, and then choose the best solution. However, this approach can’t be applicable even though the number of constraints are quite small.

Active-set methods attempt to overcome this difficulty by moving sequentially from one choice of active constraints to another choice that is guaranteed to produce at least as good a solution.

The most commonly used active-set methods are feasible-point methods. An initial feasible point, if none is provided, can be obtained by using linear-programming. At each iteration of the active-set method, we select a working set of constraints that are assumed to be active at the optimum.

💡

즉, minimizing

f

subject to the constraints를 equality constraint with respect to the working set으로 돌려서 풀겠다는 의미이다.

💡

Note that these are linear inequality constraints.

💡

In general, the working set at the current point is a subset of the constraints that are active at

x

, so that

x

is a feasible point for the working set. There may also be constraints that are active at

x

but that are not included in the working set. Hence the working set is not necessarily equal to the active set.

Algorithm

Initialize with some working set $W$ of active constraints

Iterate
1. Optimality check
  - If no constraints are active and the current point is an unconstrained stationary point, then stop
  - If there are active constraints and the current point is optimal with respect to the active constraints, compute the Lagrange multipliers
  - If the multipliers are non-negative, stop. Otherwise, drop a constraint with negative multiplier from $W$ .
    💡
    즉, multiplier가 전부 다 양수라는 의미는 gradient 반대 방향이 interior로 절대로 들어오지 않는다는 의미이다. (constraint를 $\ge$ 로 잡은 상태이므로 이렇게 해석할 수 있다.). 이 의미는 active constraint에 대해서 optimal이고 interior point로 들어가는 순간 함숫값이 커진다는 의미이므로 local optimum이라고 단정할 수 있는 것이다.
1. Search direction
  - Determine a descent direction $p$ that is feasible with respect to $W$
1. Step length
  - Determine a satisfying $f(x_k+\alpha p) < f(x_k)$ and $\alpha \le\bar \alpha$ where $\bar \alpha$ is the maximum possible step length along $p$
1. Update
  - $x_{k + 1} = x_k + \alpha p$ . If new constraint boundary encountered, add to $W$ .

💡

요약하면 working set만을 고려하는 unconstraint optimization problem을 푸는 것처럼 풀고, 나머지 non-working set까지 고려해서 condition이 violated되었는지를 판단하는 것이다.

Example

\min_x \frac{1}{2}(x_1 - 3)^2 + (x_2 - 2)^2

$\text{subject to}$

\begin{align}2x_1 -x_2 \ge 0 \\ -x_1 - x_2 \ge -4 \\ x_2 \ge 0\end{align}

when starting at $x_1 = 0$ and $x_2 = 0$

Active constraints : 1, 3

Find $\lambda_1, \lambda_2$ such that
$\nabla f(0, 0) = \lambda_1\nabla g_1(0, 0) + \lambda_3 \nabla g_3(0, 0) \\ \Rightarrow \begin{bmatrix}-3\\-4\end{bmatrix} = \lambda_1 \begin{bmatrix}2 \\ -1\end{bmatrix} + \lambda_2 \begin{bmatrix}0 \\ 1\end{bmatrix}\\ \Rightarrow \lambda_1 = -\frac{3}{2}, \lambda_2 = -\frac{11}{2}$

Since $\lambda_1, \lambda_2 < 0$ , the current point is not a optimal point.

Drop $x_2 \ge 0$ constraint:
$\min_x \frac{1}{2}(x_1 - 3)^2 + (x_2 - 2)^2$
$\text{subject to}$
$2x_1 -x_2 = 0$

Find the null space
$A = \begin{bmatrix}2 -1\end{bmatrix}$
Therefore,
$\mathcal N(A) = \text{sp}\{(1, 2)\}$
💡
Null space 방향을 search direction으로 설정

Take
$(x_1, x_2) = (0, 0) + \delta(1, 2)$

Solve
$\min_{\delta}\frac{1}{2}(\delta - 3)^2 + (2\delta-2)^2 \\ \Rightarrow \delta = \frac{11}{9} \\ \Rightarrow x_1 = \frac{11}{9}, x_2 = \frac{22}{9}$

Since, $x_1, x_2$ doesn’t violate the other constraints, it is in the feasible reasion.

Check the optimality condition again
$\nabla f(\frac{11}{9}, \frac{22}{9}) = \lambda_1\nabla g_1(\frac{11}{9}, \frac{22}{9}) \\ \begin{bmatrix}-\frac{16}{9}\\\frac{8}{9}\end{bmatrix}= \lambda_1 \begin{bmatrix}2 \\ -1\end{bmatrix} \\ \Rightarrow \lambda_1 = -\frac{8}{9}$
→ It is also not a optimal point. So we also drop this constraint also

We are now left with no active constraints, which means that the problem is locally unconstrained. Therefore, reduced gradient is simply the unconstrained Newton direction
$p = - \nabla^2f(x_1)^{-1}\nabla f(x_1)$

Calculate the largest feasible step by considering the given constraints

Iterate above procedures

Sequential Quadratic Programming (SQP)

Sequential quadratic programming is a popular and successful technique for solving non-linearly constrained problems such as

\min f(x)

$\text{subject to}$

g(x) = 0

💡

Active-set method의 경우, linear constraint였음을 잘 기억해야 한다.

The main idea is to obtain a search direction by solving a quadratic program, that is a problem with a quadratic objective function and linear constraints.

💡

This approach is a generalization of Newton’s method for unconstrained minimization

Idea

The Lagrangian for this problem is

\mathcal L(x, \lambda) = f(x) - \lambda ^Tg(x)

and the first-order optimality condition is

\nabla \mathcal L(x, \lambda) = 0

Then the formula for Newton’s method is

\begin{bmatrix} x_{k + 1} \\ \lambda_{k + 1} \end{bmatrix} = \begin{bmatrix} x_{k} \\ \lambda_{k} \end{bmatrix} + \begin{bmatrix} p_k \\ v_k \end{bmatrix}

where $p_k$ and $v_k$ are obtained as the solution to the linear system

\nabla^2 \mathcal L(x_k , \lambda_k)\begin{bmatrix} p_{k} \\ v_{k} \end{bmatrix} = -\nabla \mathcal L(x_k, \lambda_k) \\ \Rightarrow \begin{bmatrix}\nabla^2_{xx}\mathcal L(x_k, \lambda_k) & -\nabla g(x_k)^T \\ -\nabla g(x_k) & 0\end{bmatrix}\begin{bmatrix} p_{k} \\ v_{k} \end{bmatrix} = \begin{bmatrix}-\nabla_x\mathcal L(x_k, \lambda_k) \\ g(x_k)\end{bmatrix}

💡

This derivation is quite intuitive. If we think the Lagrangian for this problem as a unconstrained optimization problem.

Consider the following the constrained minimization problem where $(x_k, \lambda_k)$ is our Newton iteration at step $k$

\min_{p_k} \frac{1}{2}p_k^T\nabla^2_{xx}\mathcal L(x_k, \lambda_k)p_k + p_k^T \mathcal \nabla L_x(x_k, \lambda_k)

$\text{subject to}$

\nabla g(x_k)^Tp_k + g(x_k) = 0

💡

Note that the objective function is quadratic in

p

This minimization problem has Lagrangian

\mathcal L_k(p_k, v_k) = \frac{1}{2}p_k^T\nabla^2_{xx}\mathcal L(x_k, \lambda_k)p_k + p_k^T \mathcal L_x(x_k, \lambda_k) - \delta_k^T(\nabla g(x_k)^Tp_k + g(x_k))

Note that the first optimality condition for above Lagrangian is

\begin{align}\nabla^2_{xx}\mathcal L(x_k, \lambda_k)p_k - \nabla g(x_k) = -\nabla_x \mathcal L(x_k, \lambda_k) \\ -\nabla g(x_k)^Tp_k = g(x_k)\end{align}

which are equivalent to the original first-optimality condition. Then solving $\nabla \mathcal L_k(p, v_k) = 0$ gives a linear system, which is the same as the $k$ ’th Newton step of the original problem.

The quadratic function is a Taylor series approximation to the Lagrangian at $(x_k, \lambda_k)$ , and the constraints are a linear approximation to $g(x_k + p_k) = 0$

💡

In the constrained case, the formulas for Newton’s method correspond to the constrained minimization of a quadratic approximation to the Lagrangian.

Issue

SQP is an important method, and there are many issues to be considered to obtain an efficient and reliable implementation

Efficient solution of the lienar system at each Newton iteration
→ block matrix structure can be exploited

Quasi-Newton approximations to the $\nabla^2_{xx}\mathcal L(x_k, \lambda_k)$ as in the unconstrained case

Trust region, line search to improve robustness

Treatment of constraints (equality and inequality) during the iterative process

Selection of good starting guess for $\lambda$

Contents

새소식

인기 검색어

10. Feasible-Point Methods

Introduction

Active-set Methods

Algorithm

Example

Sequential Quadratic Programming (SQP)

Idea

Issue

당신이 좋아할만한 콘텐츠

티스토리툴바