But, there are two reasons why this is not realistic.
It is unlikely that the calculated value of the gradient would ever be exactly zero because of rounding errors in computer calculations.
No algorithm is guaranteed to find such a point in a finite amount of time.
As an alternative, we might consider replacing the above conditions by the test
for some small ϵ.
Suppose that the objective function were changed by changing the units in which it was measured. This would cause the objective function to be change dramatically. This minor change to the objective function would make the convergence test much more difficult to satisfy. To alleviate this difficulty, the convergence test could be modified to
However, this test is also flawed especially the optimal value of the objective function is zero. Thus we change this slightly as follows
When Newton’s method is being used, it is also appropriate to ask that ∇2f(xk) be positive semi-definite. Due to the arithmetic error, we weaken this requirement and only demand that
be positive semi-definite.
Since it is not possible to design a perfect convergence test for terminating an algorithm, it is common to insist that additional test be satisfied before a point xk is accepted as an approximate minimizer of the function f.
Combined Termination Rules
Interpretation
2nd rule : attempt to ensure that the sequences {f(xk)} are converging
3rd rule : attempt to ensure that the sequences {xk} are converging
Issue with selecting a Norm
When we apply termination rules, we have to select which norm should we use. For example, ∇f(xk)=(γ,…,γ)T, then
where n is the number of variables.
If n is large, then the l2-norm of the gradient can be large even if γ is small. This can distort the convergence tests and so it is wise to use the infinity norm when large problem are solved.
However, infinity norm also has a problem.
we can’t always guarantee that it is differentiable for all x.
This norm emphasizes the larger components in a vector, and so the smaller components may have poor relative accuracy.
Suppose,
and take ϵ3=10−3, then
and so xk would pass this test. Since infinity norm capture the largest component, even though it is still converging, the update can be stopped.
→ This effect can be ameliorated by scaling the variables.