Next: Parallel results on the Up: Numerical Results of CONDOR. Previous: Random objective functions Contents

Hock and Schittkowski set

The tests problems are arbitrary and have been chosen by A.R.Conn, K. Scheinberg and Ph.L. Toint. to test their DFO algorithm. The performances of DFO are thus expected to be, at least, good. We list the number of function evaluations that each algorithm took to solve the problem. We also list the final function values that each algorithm achieved. We do not list the CPU time, since it is not relevant in our context. The ``*'' indicates that an algorithm terminated early because the limit on the number of iterations was reached. The default values for all the parameters of each algorithm is used. The stopping tolerance of DFO was set to 1e-4, for the other algorithms the tolerance was set to appropriate comparable default values. The comparison between the algorithms is based on the number of function evaluations needed to reach the SAME precision. For the most fair comparison with DFO, the stopping criteria ( $\rho_{end}$ ) of CONDOR has been chosen so that CONDOR is always stopping with a little more precision on the result than DFO. This precision is sometime insufficient to reach the true optima of the objective function. In particular, in the case of the problems GROWTHLS and HEART6LS, the CONDOR algorithm can find a better optimum after some more evaluations (for a smaller $\rho_{end}$ ). All algorithms were implemented in Fortran 77 in double precision except COBYLA which is implemented in Fortran 77 in single precision and CONDOR which is written in C++ (in double precision). The trust region minimization subproblem of the DFO algorithm is solved by NPSOL [GMSM86], a fortran 77 non-linear optimization package that uses an SQP approach. For CONDOR, the number in parenthesis indicates the number of function evaluation needed to reach the optimum without being assured that the value found is the real optimum of the function. For example, for the WATSON problem, we find the optimum after (580) evaluations. CONDOR still continues to sample the objective function, searching for a better point. It's loosing 87 evaluations in this search. The total number of evaluation (reported in the first column) is thus 580+87=667.

$\begin{sidewaysfigure} % latex2html id marker 5970 \hspace{-.7cm} \begin{tabu... ...ONDOR, UOBYQA, DFO, PDS, LANCELOT and COBYLA on one CPU.} \end{sidewaysfigure}$
CONDOR and UOBYQA are both based on the same ideas and have nearly the same behavior. Small differences can be due to the small difference between algorithms of Chapter 4&5 and the algorithms used inside UOBYQA.

PDS stands for ``Parallel Direct Search'' [DT91]. The number of function evaluations is high and so the method doesn't seem to be very attractive. On the other hand, these evaluations can be performed on several CPU's reducing considerably the computation time.

Lancelot [CGT92] is a code for large scale optimization when the number of variable is

and the objective function is easy to evaluate (less than

.). Its model is build using finite differences and BFGS update. This algorithm has not been design for the kind of application we are interested in and is thus performing accordingly.

COBYLA [Pow94] stands for ``Constrained Optimization by Linear Approximation'' by Powell. It is, once again, a code designed for large scale optimization. It is a derivative free method, which uses linear polynomial interpolation of the objective function.

DFO [CST97,CGT98] is an algorithm by A.R.Conn, K. Scheinberg and Ph.L. Toint. It's very similar to UOBYQA and CONDOR. It has been specially designed for small dimensional problems and high-computing-load objective functions. In other words, it has been designed for the same kind of problems that CONDOR. DFO also uses a model build by interpolation. It is using a Newton polynomial instead of a Lagrange polynomial. When the DFO algorithm starts, it builds a linear model (using only

evaluations of the objective function;

is the dimension of the search space) and then directly uses this simple model to guide the research into the space. In DFO, when a point is ``too far'' from the current position, the model could be invalid and could not represent correctly the local shape of the objective function. This ``far point'' is rejected and replaced by a closer point. This operation unfortunately requires an evaluation of the objective function. Thus, in some situation, it is preferable to lower the degree of the polynomial which is used as local model (and drop the ``far'' point), to avoid this evaluation. Therefore, DFO is using a polynomial of degree oscillating between 1 and a ''full'' 2. In UOBYQA and CONDOR, we use the Moré and Sorenson algorithm [MS83,CGT00c] for the computation of the trust region step. It is very stable numerically and give very high precision results. On the other hand, DFO uses a general purpose tool (NPSOL [GMSM86]) which gives high quality results but that cannot be compared to the Moré and Sorenson algorithm when precision is critical. An other critical difference between DFO and CONDOR/UOBYQA is the formula used to update the local model. In DFO, the quadratical model built at each iteration is not defined uniquely. For a unique quadratical model in

variables one needs at least $\frac{1}{2}(n+1)(n+2)=N$ points and their function values. ``In DFO, models are often build using many fewer points and such models are not uniquely defined'' (citation from [CGT98]). The strategy used inside DFO is to select the model with the smallest Frobenius norm of the Hessian matrix. This update is highly numerically instable [Pow04]. Some recent research at this subject have maybe found a solution [Pow04] but this is still ``work in progress''. The model DFO is using can thus be very inaccurate.

In CONDOR and in UOBYQA the validity of the model is checked using the two Equations 6.5 and 6.6, which are restated here for clarity:

	$\begin{displaymath}\begin{array}{c} \text{\em All the interpolation points must}... ...\boldsymbol{x}_{(k)}\Vert \leq 2 \rho \;\;\;\;\;\; j=1,\ldots,N\end{displaymath}$
	$\begin{displaymath}\begin{array}{c}\text{\em Powell's} \\ \text{\em heuristic} ... ...\Vert d \Vert \leq \rho \} \leq \epsilon \;\;\;\; j=1,\ldots,N\end{displaymath}$

The first equation (6.5) is also used in DFO. The second equation (6.6) is NOT used in DFO. This last equation allows us to ''keep far points'' inside the model, still being assured that it is valid. It allows us to have a ``full'' polynomial of second degree for a ``cheap price''. The DFO algorithm cannot use equation 6.6 to check the validity of its model because the variable $\epsilon$ (which is computed in UOBYQA and in CONDOR as a by-product of the computation of the ``Moré and Sorenson Trust Region Step'') is not cheaply available. In DFO, the trust region step is calculated using an external tool: NPSOL [GMSM86]. $\epsilon$ is difficult to obtain and is not used.

UOBYQA and CONDOR are always using a full quadratical model. This enables us to compute Newton's steps. The Newton's steps have a proven quadratical convergence speed [DS96]. Unfortunately, some evaluations of the objective function are lost to build the quadratical model. So, we only obtain *near* quadratic speed of convergence. We have Q-superlinear convergence (see original paper of Powell [Pow00]). (In fact the convergence speed is often directly proportional to the quality of the approximation

of the real Hessian matrix of

). Usually, the price (in terms of number of function evaluations) to construct a good quadratical model is very high but using equation (6.6), UOBYQA and CONDOR are able to use very few function evaluations to update the local quadratical model.

When the dimension of the search space is greater than 25, the time needed to start, building the first quadratic, is so important (

evaluations) that DFO may becomes attractive again. Especially, if you don't want the optimum of the function but only a small improvement in a small time. If several CPU's are available, then CONDOR once again imposes itself. The function evaluations needed to build the first quadratic are parallelized on all the CPU's without any loss of efficiency when the number of CPU increases (the maximum number of CPU is

). This first construction phase has a great parallel efficiency, as opposed to the rest of the optimization algorithm where the efficiency becomes soon very low (with the number of CPU increasing). In contrast to CONDOR, the DFO algorithm has a very short initialization phase and a long research phase. This last phase can't be parallelized very well. Thus, when the number of CPU's is high, the most promising algorithm for parallelization is CONDOR. A parallel version of CONDOR has been implemented. Very encouraging experimental results on the parallel code are given in the next section.

When the local model is not convex, no second order convergence proof (see [CGT00d]) is available. It means that, when using a linear model, the optimization process can prematurely stop. This phenomenon *can* occur with DFO which uses from time to time a simple linear model. CONDOR is very robust and always converges to a local optimum (extensive numerical tests have been made [VB04]).

From the numerical results, the CONDOR algorithm (on 1 CPU) outperforms the DFO algorithm when the dimension of the search space is greater than two. This result can be explained by the fact that, most of the time, DFO uses a simple linear approximation (with few or no second degree terms) of the objective function to guide its search. This poor model gives ``sufficiently'' good search directions when

. But when

, the probability to choose a bad search direction is higher. The high instability of the least-Frobenius-norm update of the model used in DFO can also give poor model, degrading the speed of the algorithm.

Next: Parallel results on the Up: Numerical Results of CONDOR. Previous: Random objective functions Contents

Frank Vanden Berghen 2004-04-19