Computer Science/Optimization

4. Guaranteeing Convergence

728x90

Guaranteeing Convergence

The auxiliary techniques that are used to guarantee convergence attempt to rein in the optimization method when it is in danger of getting out of control, and they also try to avoid intervening when the optimization method is performing effectively.

The term globalization strategy is used to distinguish the method used for the new estimate of the solution from the method for computing the search direction.

Method for computing the search direction: derived from the Taylor series, and the Taylor series is a local approximation to the function

Selecting the new estimate of the solution: Designed to guarantee global convergence, that is, convergence from any starting point.
💡
Note that this is a convergence to a stationary point.

If the underlying optimization method produces good search directions, as is often the case with Newton’s method on well-conditioned problems, then the globalization strategies will act merely as a safety net. For a method that produces less effective search directions, such as a linear conjugate-gradient method, they can be a major contributor to the practical success of a method.

We will discuss two major types of globalization stratety

Line search method

Trust-region method: Will not be discussed in this article

Line search Method

Let $x_k$ be the current estimate of a minimizer of $f$ , and let $p_k$ be the search direction at the point $x_k$ . Then the new estimate of the solution is defined by the formula

x_{k + 1} = x_k + \alpha_kp_k

where the step length $\alpha_k$ is some scalar chosen so that

f(x_{k + 1}) < f(x_k)

Note that we assume that $p_k$ is a descent direction at. Therefore,

p_k^T\nabla f(x_k) < 0

This should be guaranteed by the algorithm used to compute the search direction.

💡

In the guaranteeing descent, it is a matter of choosing

p_k

. However, in the line search method, it is a matter of choosing

\alpha_k

If $p_k$ is a descent direction, then $f(x_k + \alpha p_k) < f(x_k)$ at least for small positive values of $\alpha$ .

The technique is called a line search because a search for a new point $x_{k + 1}$ is carried out along the line $y(\alpha) = x_k + \alpha p_k$ .

Intuitively we would like to choose $\alpha_k$ as the solution to

\argmin_{\alpha > 0} f(x_k + \alpha p_k)

💡

But it is too expensive to solve this problem exactly. So an approximate minimizer is accepted instead.

Note that we can’t always be sure whether it converges to the stationary point or not.

💡

It is related to the determination of the step size.

An example of not converging to the stationary point

Consider the minimization problem

\min_{x}x^2

with initial value $x_0 = -3$ . At each iteration, we use the search direction $p_k = 1$ with step length $\alpha_k = 2^{-k}$ . Hence

x_{k + 1} = x_k + 2^{-k}

Therefore, $f(x_{k + 1}) < f(x_k)$ . In addition, it is easy to show that $p_k$ is always a descent direction.

However, even though this simple algorithm, it doesn’t converge to a stationary point. Since

\lim_{k \to \infty}x_k = -1

but $f'(-1) = -2\ne 0$ .

💡

By this example, it is not enough to just by reducing the function value

Guarantee the convergence to the stationary point

One way to guarantee convergence is to make additional assumptions.

Assumptions on search direction $p_k$
1. it produces sufficient descent
1. it is gradient related
💡
These conditions can normally be guaranteed by making slight modifications to the method used to compute the search direction. Techniques for doing this are discussed in the context of specific methods.

Assumptions on step length $\alpha_k$
1. it produces a sufficient decrease in the function $f$
1. it is not too small (it is called backtracking)
💡
Armijo condition + backtracking: sometimes called armijo line search

Sufficient decrease (Armijo condition)

This condition requires that

f(x_k+\alpha_kp_k)\le f(x_k) + \mu\alpha_kp_k^T\nabla f(x_k)

where $\mu$ is some scalar satisfying $0 < \mu < 1$

💡

Note that

f(x_k) +\alpha_k\nabla f(x_k)p_k

is just a 1st-order approximation of

f

p_k

This condition ensures that the function value decreases by a certain proportion at each iteration, guiding the algorithm towards the optimal solution.

If $\alpha$ is small, the linear approximation will be good, and the sufficient decrease condition will be satisfied. However, if $\alpha$ is large, the decrease predicted by linear approximation may differ greatly from the actual decrease in $f$ , and the condition can be violated.

💡

즉,

x_k

를 업데이트 하는 과정에서 함숫값이 작아지는 것까지 step size를 허용하는 것이다.

Backtracking

The Armijo condition doesn’t forbid tiny step size.

Let $p_k$ be a search direction satisfying the sufficient descent condition. Define $\alpha_k$ to be the first element of the sequence

1, \frac{1}{2}, \frac{1}{4}, \dots, 2^{-i}, \dots

stop once the Armijo condition is fulfilled. Sometimes Armijo condition + backtracking is called Armijo line search

💡

A more efficient way than backtracking is the Wolfe condition

💡

By backtracking, we don’t take smaller steps than necessary

Conclusion

By adding some assumptions and using the above techniques, we guarantee that

\nabla f(x_k) \rarr 0

Let $f$ be a real-valued function of $n$ variables. Let $x_0$ be a given initial point and define $\{x_k\}$ by $x_{k + 1} = x_k + \alpha_k p_k$ , where $p_k$ is a vector of dimension $n$ and $\alpha_k \ge 0$ is a scalar. Assume that

the set $S = \{x : f(x) \le f(x_0)\}$ is bounded
→ This ensures that the function takes on its minimum value at a finite point
💡
For example, we want to delete the case such as $f(x)= e^x$

$\nabla f$ is Lipschitz continuous for all $x$

the vectors $p_k$ satisfy a sufficient decent condition

the search directions are the gradient-related and bounded norm
$\|p_k\| \le M, \forall k\in \mathbb N$

the scalar $\alpha_k$ is chosen as the first element of the sequence $\{1, \frac{1}{2}, \frac{1}{4}, \dots\}$ to satisfy a sufficient decrease condition
$f(x_k + \alpha_kp_k) \le f(x_k) + \mu\alpha_k\nabla f(x_k)p_k$
where $0 < \mu < 1$

Then

\lim_{k \to \infty}\|\nabla f(x_k)\| = 0

Contents

새소식

인기 검색어