So far, all methods considered require at least first-order derivative
But what if
The function is not differentiable everywhere?
It is very time-consuming to perform derivative calculation?
That is why Derivative-Free methods came.
→ search methods without the use of any derivative information
Compass Search
Intuition : Try a few points around the current solution
Better point found : move, and repeat
Otherwise : reduce the scope
A pattern of trial points; one along each dimension and direction (”compass”)
Reduce the pattern size (step size) in case of no improvement
Example
Algorithm
Specify an initial guess x0, an initial pattern size Δ0>0, and convergence tolerance Δtol
For k=0,1,…
If Δk<Δtol, stop
Evaluate f at the points xk and xk+Δkdi,∀i=1,…,2n
→ di has 1 or -1 in element i and zero otherwise
If f(xk+Δkdi)<f(xk) for some i, set xk+1=xk+Δkdi and Δk+1=Δk
Otherwise, set Δk+1=Δk/2
Convergence
If the assumptions about the objective function are satisfied, the compass search method is guaranteed to converge. A formal statement of the theorem is given below.
Let f be a real-valued function of n variables. Let x0 be a given initial point and determine {xk} using the compass search algorithm with initial pattern size Δ0. Assume that
the set S={x:f(x)≤f(x0)} is bounded
∇f is Lipschitz continuous for all x, that is
for some constant 0<L<∞
Let {xk} be the set of unsuccessful iterations, i.e., the iterations where a better point can’t be found and Δk is reduced. Then
It is also possible to determine the rate of convergence or the compass search algorithm.
If additional assumptions are made, then it can be shown that at the unsuccessful iterations,
for some constant c that does not depend on k. This is similar to, but not the same as, a linear rate of convergence. There are several differences between this result and linear convergence.
We consider only unsuccessful iterations and ignore successful iterations.
the result shows only that {xk−x∗} is bounded by a linearly convergent series. At each unsucessful iteration we divide Δk by 2, so the sequence {Δk} converges linearly to zero. This does not guarantee that {xk−x∗} converges linearly.
Nevertheless, it is similar to linear convergence, and this property is sometimes referred to as r-linear convergence.