Computer Science/Optimization

7. Derivative-Free Methods

728x90

Motivation

So far, all methods considered require at least first-order derivative

But what if

The function is not differentiable everywhere?

It is very time-consuming to perform derivative calculation?

That is why Derivative-Free methods came.

→ search methods without the use of any derivative information

Compass Search

Intuition : Try a few points around the current solution
- Better point found : move, and repeat
- Otherwise : reduce the scope

A pattern of trial points; one along each dimension and direction (”compass”)

Reduce the pattern size (step size) in case of no improvement

Example

Algorithm

Specify an initial guess $x_0$ , an initial pattern size $\Delta_0 > 0$ , and convergence tolerance $\Delta_{tol}$

For $k = 0, 1, \dots$
1. If $\Delta_k < \Delta_{tol}$ , stop
1. Evaluate $f$ at the points $x_k$ and $x_k + \Delta_k d_i, \forall i = 1, \dots, 2n$
  → $d_i$ has 1 or -1 in element $i$ and zero otherwise
  💡
  각 coordinate마다 +1, -1을 시도해보는 것으로 생각하면 된다.
1. If $f(x_k + \Delta_k d_i) < f(x_k)$ for some $i$ , set $x_{k + 1} = x_k + \Delta_k d_i$ and $\Delta_{k + 1} = \Delta_k$
1. Otherwise, set $\Delta_{k + 1} = \Delta_k /2$

Convergence

If the assumptions about the objective function are satisfied, the compass search method is guaranteed to converge. A formal statement of the theorem is given below.

Let $f$ be a real-valued function of $n$ variables. Let $x_0$ be a given initial point and determine $\{x_k\}$ using the compass search algorithm with initial pattern size $\Delta_0$ . Assume that

the set $S = \{x : f(x) \le f(x_0)\}$ is bounded

$\nabla f$ is Lipschitz continuous for all $x$ , that is
$\|\nabla f(x) - \nabla f(y)\| \le L\|x - y\|$
for some constant $0 < L < \infty$

Let $\{x_k\}$ be the set of unsuccessful iterations, i.e., the iterations where a better point can’t be found and $\Delta_k$ is reduced. Then

\lim_{k \to \infty} \|\nabla f(x_k)\| = 0

It is also possible to determine the rate of convergence or the compass search algorithm.

If additional assumptions are made, then it can be shown that at the unsuccessful iterations,

\|x_k - x^*\|\le c \Delta_k

for some constant $c$ that does not depend on $k$ . This is similar to, but not the same as, a linear rate of convergence. There are several differences between this result and linear convergence.

We consider only unsuccessful iterations and ignore successful iterations.

the result shows only that $\{x_k - x^*\}$ is bounded by a linearly convergent series. At each unsucessful iteration we divide $\Delta_k$ by 2, so the sequence $\{\Delta_k\}$ converges linearly to zero. This does not guarantee that $\{x_k - x^*\}$ converges linearly.

Nevertheless, it is similar to linear convergence, and this property is sometimes referred to as r-linear convergence.

Contents

새소식

인기 검색어