5. Methods for Unconstrained Optimization
- -
Introduction
A principal advantage of Newton’s method is that it converges rapidly
when the current estimate of the variables is close to the solution. However, it also has disadvantages, and overcoming these disadvantages has led to many other techniques.
In particular, Newton’s method can
- fail to converge
- converge to a point that is
not
a stationary point
However, this kind of problem can be overcome by using strategies that guarantee progress
toward the solution at every iteration, such as the line search and trust-region strategies.
The cost
of Newton’s method can also be a concern. It requires the derivation, computation, and storage of the second derivative matrix, and the solution of a system of linear equations.
Luckily, large problems are frequently sparse
, and taking advantage of sparsity can greatly reduce the computational cost of Newton’s method.
We will introduce methods that are compromises to Newton’s method and that reduce one or more of these costs.
- Quasi-Newton method: the most widely used Newton-type method for problems of moderate size
- Steepest-descent method: an old and widely known method whose costs are low but whose performance is usually bad
Steepest-Descent Method
This method is the simplest
Newton-type method for nonlinear optimization. The price for this simplicity is that the method is hopelessly
inefficient at solving most problems.
Advantage
- does not require the computation of
second derivatives
Disadvantage
- It has a
linear
rate of convergence. Therefore, it isslower
than Newton’s method
The steepest-descent method computes the search direction from
and then uses a line search to determine
The formula for the search direction can be derived in two ways.
First derivation
If we think the Hessian approximated by the identity matrix
, we can view Newton’s method as a
Second derivation
By using 1st-order approximation
The intuitive idea is to minimize this approximation to obtain the search direction. However, this approximation does not have a finite minimum in general. Instead, the search direction is computed by minimizing a scaled version of this approximation
The solution is pk=−∇f(xk)
Quasi-Newton Methods
Quasi-Newton methods are among the most widely used methods for nonlinear optimization in particular when the Hessian is hard to compute
.
There are many different quasi-Newton methods, but they are all based on approximating the Hessian
∇2f(xk) by another matrix Bk that is available at a lower cost. Then the search direction is obtained by solving
Therefore, this is equivalent to minimizing the quadratic model
Advantage
- Bk can be found using only
first-derivative
information
- the search direction can be computed using only O(n2)
Disadvantage
- The methods do
not
converge quadratically, but they can converge super-linearly
- Still require matrix storage, so they are not normally used to solve large problems.
These methods are generalizations of a method for one-dimensional problems called the secant method
. The secant method uses the approximation
Similarly,
From this, we can obtain the condition used to define the quasi-Newton approximations Bk
We will call this the secant condition
.
Suppose f is a quadratic function, f(x)=21xTQx−cTx. In this case
So that the Hessian matrix Q satisfies the secant condition. Intuitively we are asking that the approximation Bk mimic the behavior of the Hessian matrix when it multiplies xk−xk−1.
Although this interpretation is precise only for quadratic functions, it holds in an approximate way for general nonlinear functions. In addition, we have to find the approximation Bk that imitates the effect of the Hessian matrix along a particular direction(xk−xk−1).
For simplicity, let sk=xk+1−xk and yk=∇f(xk+1)−∇f(xk). Thus secant condition then becomes
An example of a quasi-Newton approximation is given by the formula
Properties of quasi-Newton method
- The secant condition will be satisfied regardless of how Bk is chosen
- The new approximation Bk+1 is obtained by modifying the old approximation Bk. To start a quasi-Newton method some initial approximation B0 must be
specified
. Often B0=I is used, but it is reasonable and often advantageous to supply a better initial approximation if one can be obtained with little effort.
- The new approximation Bk+1 must be closed to Bk
- The search direction can be computed using O(n2). Normally the computational cost of solving a system of linear equations is O(n3). But in this case, it is possible because it is possible to derive formulas that update a Cholesky factorization of Bk.
All the quasi-Newton method have the form
The something represents an update to the old approximation Bk, and so a formula for a quasi-Newton approximation is often referred to as an update formula
.
Symmetric Rank-one update
In this algorithm, we want to find Bk+1 that closed to Bk in terms of the difference in its rank
- Formula
- It satisfies the secant condition.
- This is the only rank-one update preserving symmetry
- We
can not
guarantee that it is a positive definite
- Super efficient to calculate
Derivation
BGFS
In this algorithm, we want to find Bk+1 that closed to Bk in terms of the Frobenius norm of its difference
Symmetry is not the only property that can be imposed. Since the Hessian matrix at the solution will normally be positive definite, it is reasonable to ask the matrices Bk be positive definite as well.
There is no rank-one update formula that maintains both symmetry
and positive definiteness
of the Hessian approximation. However, there are infinitely many rank-two formulas that do this. The most widely used formula, and the one considered to be most effective, is BGFS
update formula
Broyden class
where vk=ykTskyk−skTBkskBksk
- ϕ=0 : BGFS
- ϕ=1 : Davidon, Fletcher, and Powell (DFP) update, older and less efficient than BFGS
In this algorithm, we want to find Bk+1 that closed to Bk in terms of the Frobenius norm of its difference
소중한 공감 감사합니다