Computer Science/Optimization

8. Termination Rules

  • -
728x90
반응형

Termination Rules

Ideally,

f(xk)=0\nabla f(x_k) = 0

and

2f(xk)\nabla^2f(x_k)

is positive semi-definite.

But, there are two reasons why this is not realistic.

  1. It is unlikely that the calculated value of the gradient would ever be exactly zero because of rounding errors in computer calculations.
  1. No algorithm is guaranteed to find such a point in a finite amount of time.

As an alternative, we might consider replacing the above conditions by the test

f(xk)ϵ\|\nabla f(x_k)\| \le \epsilon

for some small ϵ\epsilon.

Suppose that the objective function were changed by changing the units in which it was measured. This would cause the objective function to be change dramatically. This minor change to the objective function would make the convergence test much more difficult to satisfy. To alleviate this difficulty, the convergence test could be modified to

f(xk)ϵf(xk)\|\nabla f(x_k)\| \le \epsilon |f(x_k)|

However, this test is also flawed especially the optimal value of the objective function is zero. Thus we change this slightly as follows

f(xk)ϵ(1+f(xk))\|\nabla f(x_k)\| \le \epsilon (1 + |f(x_k)|)

When Newton’s method is being used, it is also appropriate to ask that 2f(xk)\nabla^2f(x_k) be positive semi-definite. Due to the arithmetic error, we weaken this requirement and only demand that

2f(xk)+ϵI\nabla^2f(x_k) + \epsilon I

be positive semi-definite.

Since it is not possible to design a perfect convergence test for terminating an algorithm, it is common to insist that additional test be satisfied before a point xkx_k is accepted as an approximate minimizer of the function ff.

Combined Termination Rules

f(xk)ϵ1(1+f(xk))f(xk1)f(xk)ϵ2(1+f(xk))xk1xkϵ3(1+xk)2f(xk)+ϵ4I is positive semi-definite\|\nabla f(x_k)\| \le \epsilon_1 (1 + |f(x_k)|) \\ f(x_{k - 1}) - f(x_k) \le \epsilon_2(1 + |f(x_k)|) \\ \|x_{k - 1} - x_k\| \le \epsilon_3(1 + \|x_k\|) \\ \nabla^2f(x_k) + \epsilon_4I \text{ is positive semi-definite}

Interpretation

  1. 2nd rule : attempt to ensure that the sequences {f(xk)}\{f(x_k)\} are converging
  1. 3rd rule : attempt to ensure that the sequences {xk}\{x_k\} are converging

Issue with selecting a Norm

When we apply termination rules, we have to select which norm should we use. For example, f(xk)=(γ,,γ)T\nabla f(x_k) = (\gamma, \dots, \gamma)^T, then

f(xk)2=nγf(xk)=γ\|\nabla f(x_k)\|_2 = \sqrt n |\gamma| \\ \|\nabla f(x_k)\|_\infty = |\gamma|

where nn is the number of variables.

If nn is large, then the l2l_2-norm of the gradient can be large even if γ\gamma is small. This can distort the convergence tests and so it is wise to use the infinity norm when large problem are solved.

However, infinity norm also has a problem.

  1. we can’t always guarantee that it is differentiable for all xx.
  1. This norm emphasizes the larger components in a vector, and so the smaller components may have poor relative accuracy.

    Suppose,

    xk1=(1.44453,0.00093,0.0000079)xk=(1.44441,0.00012,0.0000011)x_{k - 1} = (1.44453, 0.00093, 0.0000079) \\ x_k = (1.44441, 0.00012, 0.0000011)

    and take ϵ3=103\epsilon_3 = 10^{-3}, then

    xk1xk=(0.00012,0.00081,0.0000068)=0.00081ϵ3xk=1.44441103\begin{align}\|x_{k - 1} - x_k\|_\infty &= \|(0.00012, 0.00081, 0.0000068)\| \\ &= 0.00081 \le \epsilon_3\|x_k\| = 1.44441 * 10^{-3}\end{align}

    and so xkx_k would pass this test. Since infinity norm capture the largest component, even though it is still converging, the update can be stopped.

    → This effect can be ameliorated by scaling the variables.

반응형
Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.