Next: Gram-Schmidt orthogonalization procedure. Up: Annexes Previous: Annexes   Contents

Subsections

## Speed of convergence of Newton's method.

We have:

 (13.1)

The Taylor series of around is:
 (13.2)

If we set in Equation 13.2 , we obtain :

If we multiply the left and right side of the previous equation by , we obtain, using 13.1:

By using the definition of :

with .
This is the definition of quadratic convergence. Newton's method has quadratic convergence speed.

### Note

If the objective function is locally quadratic and if we have exactly , then we will find the optimum in one iteration of the algorithm.
Unfortunately, we usually don't have , but only an approximation of it. For example, if this approximation is constructed using several BFGS update'', it becomes close to the real value of the Hessian only after updates. It means we will have to wait at least iterations before going in one step'' to the optimum. In fact, the algorithm becomes only'' super-linearly convergent.

## How to improve Newton's method : Zoutendijk Theorem.

We have seen that Newton's method is very fast (quadratic convergence) but has no global convergence property ( can be negative definite). We must search for an alternative method which has global convergence property. One way to prove global convergence of a method is to use Zoutendijk theorem.
Let us define a general method:
1. find a search direction
2. search for the minimum in this direction using 1D-search techniques and find with

 (13.3)

3. Increment . Stop if otherwise, go to step 1.

The 1D-search must respect the Wolf conditions. If we define , we have:
 (13.4) (13.5)

The objective of the Wolf conditions is to give a lower bound (equation 13.5) and upper bound (equation 13.4) on the value of , such that the 1D-search algorithm is easier: see Figure 13.1. Equation 13.4 expresses that the objective function must be reduced sufficiently. Equation 13.5 prevents too small steps. The parameter defines the precision of the 1D-line search:
• exact line search :
• inexact line search :
We must also define which is the angle between the steepest descent direction () and the current search direction ():

 (13.6)

Under the following assumptions:
• bounded below
• is continuously differentiable in a neighborhood of the level set
• is lipschitz continuous:

 (13.7)

We have:

 Zoutendijk Theorem: (13.8)

From Equation 13.5, we have:
 we add on both side: (13.9)

From Equation 13.7, we have:
 using Equation 13.3 : we multiply by both sides: (13.10)

Combining Equation 13.9 and 13.10 we obtain:
 (13.11)

We can replace in Equation 13.4:

the by its lower bound from Equation 13.11. We obtain:

If we define and if we use the definition of (see eq. 13.6), we have:
 (13.12) (13.13)

Summing Equation 13.13 on k, we have
 (13.14)

We know that is bounded below, we also know from Equation 13.12, that . So, for a given large value of k (and for all the values above), we will have . The sum on the left side of Equation 13.14 converges and is finite: . Thus,we have:

which concludes the proof.

### Angle test.

To make an algorithm globally convergent, we can make what is called an angle test''. It consists of always choosing a search direction such that . This means that the search direction do not tends to be perpendicular to the gradient. Using the Zoutendijk theorem (see Equation 13.8), we obtain:

which means that the algorithm is globally convergent.
The angle test'' on Newton's method prevents quadratical convergence. We must not use it.

### The steepest descent'' trick.

If, regularly (lets say every iterations), we make a steepest descent'' step, we will have for this step, . It will be impossible to have . So, using Zoutendijk, the only possibility left is that . The algorithm is now globally convergent.

Next: Gram-Schmidt orthogonalization procedure. Up: Annexes Previous: Annexes   Contents
Frank Vanden Berghen 2004-04-19