The auxiliary techniques that are used to guarantee convergence attempt to rein in the optimization method when it is in danger of getting out of control, and they also try to avoid intervening when the optimization method is performing effectively.
The term globalization strategy is used to distinguish the method used for the new estimate of the solution from the method for computing the search direction.
Method for computing the search direction: derived from the Taylor series, and the Taylor series is a local approximation to the function
Selecting the new estimate of the solution: Designed to guarantee global convergence, that is, convergence from any starting point.
If the underlying optimization method produces good search directions, as is often the case with Newton’s method on well-conditioned problems, then the globalization strategies will act merely as a safety net. For a method that produces less effective search directions, such as a linear conjugate-gradient method, they can be a major contributor to the practical success of a method.
We will discuss two major types of globalization stratety
Line search method
Trust-region method: Will not be discussed in this article
Line search Method
Let xk be the current estimate of a minimizer of f, and let pk be the search direction at the point xk. Then the new estimate of the solution is defined by the formula
where the step length αk is some scalar chosen so that
Note that we assume that pk is a descent direction at. Therefore,
This should be guaranteed by the algorithm used to compute the search direction.
If pk is a descent direction, then f(xk+αpk)<f(xk) at least for small positive values of α.
The technique is called a line search because a search for a new point xk+1 is carried out along the line y(α)=xk+αpk.
Intuitively we would like to choose αk as the solution to
Note that we can’t always be sure whether it converges to the stationary point or not.
An example of not converging to the stationary point
Consider the minimization problem
with initial value x0=−3. At each iteration, we use the search direction pk=1 with step length αk=2−k. Hence
Therefore, f(xk+1)<f(xk). In addition, it is easy to show that pk is always a descent direction.
However, even though this simple algorithm, it doesn’t converge to a stationary point. Since
but f′(−1)=−2=0.
Guarantee the convergence to the stationary point
One way to guarantee convergence is to make additional assumptions.
Assumptions on search direction pk
it produces sufficient descent
it is gradient related
Assumptions on step length αk
it produces a sufficient decrease in the function f
it is not too small (it is called backtracking)
Sufficient decrease (Armijo condition)
This condition requires that
where μ is some scalar satisfying 0<μ<1
This condition ensures that the function value decreases by a certain proportion at each iteration, guiding the algorithm towards the optimal solution.
If α is small, the linear approximation will be good, and the sufficient decrease condition will be satisfied. However, if α is large, the decrease predicted by linear approximation may differ greatly from the actual decrease in f, and the condition can be violated.
Backtracking
The Armijo condition doesn’t forbid tiny step size.
Let pk be a search direction satisfying the sufficient descent condition. Define αk to be the first element of the sequence
stop once the Armijo condition is fulfilled. Sometimes Armijo condition + backtracking is called Armijo line search
Conclusion
By adding some assumptions and using the above techniques, we guarantee that
Let f be a real-valued function of n variables. Let x0 be a given initial point and define {xk} by xk+1=xk+αkpk, where pk is a vector of dimension n and αk≥0 is a scalar. Assume that
the set S={x:f(x)≤f(x0)} is bounded
→ This ensures that the function takes on its minimum value at a finite point
∇f is Lipschitz continuous for all x
the vectors pk satisfy a sufficient decent condition
the search directions are the gradient-related and bounded norm
the scalar αk is chosen as the first element of the sequence {1,21,41,…} to satisfy a sufficient decrease condition