4. Guaranteeing Convergence
- -
Guaranteeing Convergence
The auxiliary techniques that are used to guarantee convergence attempt to rein in the optimization method when it is in danger of getting out of control, and they also try to avoid intervening when the optimization method is performing effectively.
The term globalization strategy
is used to distinguish the method used for the new estimate of the solution from the method for computing the search direction.
- Method for computing the search direction: derived from the Taylor series, and the Taylor series is a
local
approximation to the function
- Selecting the new estimate of the solution: Designed to guarantee
global convergence
, that is, convergence from any starting point.💡Note that this is a convergence to a stationary point.
If the underlying optimization method produces good search directions, as is often the case with Newton’s method on well-conditioned problems, then the globalization strategies will act merely
as a safety net. For a method that produces less effective search directions, such as a linear conjugate-gradient method, they can be a major contributor to the practical success of a method.
We will discuss two major types of globalization stratety
- Line search method
- Trust-region method: Will not be discussed in this article
Line search Method
Let xk be the current estimate of a minimizer of f, and let pk be the search direction at the point xk. Then the new estimate of the solution is defined by the formula
where the step length αk is some scalar chosen so that
Note that we assume that pk is a descent direction at. Therefore,
This should be guaranteed by the algorithm used to compute the search direction
.
If pk is a descent direction, then f(xk+αpk)<f(xk) at least for small positive values of α.
The technique is called a line search
because a search for a new point xk+1 is carried out along the line y(α)=xk+αpk.
Intuitively we would like to choose αk as the solution to
expensive
to solve this problem exactly. So an approximate minimizer
is accepted instead.Note that we can’t always be sure whether it converges to the stationary point or not.
An example of not converging to the stationary point
Consider the minimization problem
with initial value x0=−3. At each iteration, we use the search direction pk=1 with step length αk=2−k. Hence
Therefore, f(xk+1)<f(xk). In addition, it is easy to show that pk is always a descent direction.
However, even though this simple algorithm, it doesn’t converge to a stationary point. Since
but f′(−1)=−2=0.
Guarantee the convergence to the stationary point
One way to guarantee convergence is to make additional assumptions.
- Assumptions on search direction pk
- it produces
sufficient descent
- it is
gradient related
💡These conditions can normally be guaranteed by making slight modifications to the method used to compute the search direction. Techniques for doing this are discussed in the context of specific methods. - it produces
- Assumptions on step length αk
- it produces a
sufficient decrease
in the function f
- it is not
too small
(it is calledbacktracking
)
💡Armijo condition + backtracking: sometimes calledarmijo line search
- it produces a
Sufficient decrease (Armijo condition)

This condition requires that
where μ is some scalar satisfying 0<μ<1
This condition ensures that the function value decreases by a certain proportion
at each iteration, guiding the algorithm towards the optimal solution.
If α is small, the linear approximation will be good, and the sufficient decrease condition will be satisfied. However, if α is large, the decrease predicted by linear approximation may differ greatly from the actual decrease in f, and the condition can be violated.
Backtracking
The Armijo condition doesn’t forbid tiny step size.
Let pk be a search direction satisfying the sufficient descent condition. Define αk to be the first element of the sequence
stop once the Armijo condition is fulfilled. Sometimes Armijo condition + backtracking is called Armijo line search
Conclusion
By adding some assumptions and using the above techniques, we guarantee that
Let f be a real-valued function of n variables. Let x0 be a given initial point and define {xk} by xk+1=xk+αkpk, where pk is a vector of dimension n and αk≥0 is a scalar. Assume that
- the set S={x:f(x)≤f(x0)} is bounded
→ This ensures that the function takes on its minimum value at a finite point
💡For example, we want to delete the case such as f(x)=ex
- ∇f is Lipschitz continuous for all x
- the vectors pk satisfy a sufficient decent condition
- the search directions are the gradient-related and bounded norm∥pk∥≤M,∀k∈N
- the scalar αk is chosen as the first element of the sequence {1,21,41,…} to satisfy a sufficient decrease conditionf(xk+αkpk)≤f(xk)+μαk∇f(xk)pk
where 0<μ<1
Then
소중한 공감 감사합니다