Computer Science/Optimization

8. Termination Rules

728x90

Termination Rules

Ideally,

\nabla f(x_k) = 0

and

\nabla^2f(x_k)

is positive semi-definite.

But, there are two reasons why this is not realistic.

It is unlikely that the calculated value of the gradient would ever be exactly zero because of rounding errors in computer calculations.

No algorithm is guaranteed to find such a point in a finite amount of time.

As an alternative, we might consider replacing the above conditions by the test

\|\nabla f(x_k)\| \le \epsilon

for some small $\epsilon$ .

Suppose that the objective function were changed by changing the units in which it was measured. This would cause the objective function to be change dramatically. This minor change to the objective function would make the convergence test much more difficult to satisfy. To alleviate this difficulty, the convergence test could be modified to

\|\nabla f(x_k)\| \le \epsilon |f(x_k)|

However, this test is also flawed especially the optimal value of the objective function is zero. Thus we change this slightly as follows

\|\nabla f(x_k)\| \le \epsilon (1 + |f(x_k)|)

When Newton’s method is being used, it is also appropriate to ask that $\nabla^2f(x_k)$ be positive semi-definite. Due to the arithmetic error, we weaken this requirement and only demand that

\nabla^2f(x_k) + \epsilon I

be positive semi-definite.

Since it is not possible to design a perfect convergence test for terminating an algorithm, it is common to insist that additional test be satisfied before a point $x_k$ is accepted as an approximate minimizer of the function $f$ .

Combined Termination Rules

\|\nabla f(x_k)\| \le \epsilon_1 (1 + |f(x_k)|) \\ f(x_{k - 1}) - f(x_k) \le \epsilon_2(1 + |f(x_k)|) \\ \|x_{k - 1} - x_k\| \le \epsilon_3(1 + \|x_k\|) \\ \nabla^2f(x_k) + \epsilon_4I \text{ is positive semi-definite}

Interpretation

2nd rule : attempt to ensure that the sequences $\{f(x_k)\}$ are converging

3rd rule : attempt to ensure that the sequences $\{x_k\}$ are converging

Issue with selecting a Norm

When we apply termination rules, we have to select which norm should we use. For example, $\nabla f(x_k) = (\gamma, \dots, \gamma)^T$ , then

\|\nabla f(x_k)\|_2 = \sqrt n |\gamma| \\ \|\nabla f(x_k)\|_\infty = |\gamma|

where $n$ is the number of variables.

If $n$ is large, then the $l_2$ -norm of the gradient can be large even if $\gamma$ is small. This can distort the convergence tests and so it is wise to use the infinity norm when large problem are solved.

However, infinity norm also has a problem.

we can’t always guarantee that it is differentiable for all $x$ .

This norm emphasizes the larger components in a vector, and so the smaller components may have poor relative accuracy.
Suppose,
$x_{k - 1} = (1.44453, 0.00093, 0.0000079) \\ x_k = (1.44441, 0.00012, 0.0000011)$
and take $\epsilon_3 = 10^{-3}$ , then
$\begin{align}\|x_{k - 1} - x_k\|_\infty &= \|(0.00012, 0.00081, 0.0000068)\| \\ &= 0.00081 \le \epsilon_3\|x_k\| = 1.44441 * 10^{-3}\end{align}$
and so $x_k$ would pass this test. Since infinity norm capture the largest component, even though it is still converging, the update can be stopped.
→ This effect can be ameliorated by scaling the variables.

Contents

새소식

인기 검색어

8. Termination Rules

Termination Rules

Combined Termination Rules

Issue with selecting a Norm

당신이 좋아할만한 콘텐츠

티스토리툴바