CONDOR, an new Parallel, Constrained
extension of Powell's UOBYQA algorithm.
Experimental results and comparison
with the DFO algorithm.

Frank Vanden Berghen, Hugues Bersini

IRIDIA, Université Libre de Bruxelles
50, av. Franklin Roosevelt
1050 Brussels, Belgium
Tel: +32 2 650 27 29, Fax: 32 2 650 27 15
fvandenb@iridia.ulb.ac.be, bersini@ulb.ac.be

Date: August 1, 2004

PDF version of this page is available here.
Abstract:

This paper presents an algorithmic extension of Powell's UOBYQA algorithm ("Unconstrained Optimization BY Quadratical Approximation"). We start by summarizing the original algorithm of Powell and by presenting it in a more comprehensible form. Thereafter, we report comparative numerical results between UOBYQA, DFO and a parallel, constrained extension of UOBYQA that will be called in the paper CONDOR ("COnstrained, Non-linear, Direct, parallel Optimization using trust Region method for high-computing load function"). The experimental results are very encouraging and validate the approach. They open wide possibilities in the field of noisy and high-computing-load objective functions optimization (from two minutes to several days) like, for instance, industrial shape optimization based on CFD (Computation Fluid Dynamic) codes or PDE (partial differential equations) solvers. Finally, we present a new, easily comprehensible and fully stand-alone implementation in C++ of the parallel algorithm.

Introduction

Powell's UOBYQA algorithm ([34] or [35]) is a new algorithm for unconstrained, direct optimization that take into account the curvature of the objective function, leading to a high convergence speed. UOBYQA is the direct successor of COBYLA [31]. Classical quasi-newton methods also use curvature information ([27,2,1,15,18]) but they need explicit gradient information, usually obtained by finite difference. In the field of aerodynamical shape optimization, the objective functions are based on expensive simulation of CFD (computation fluid dynamic) codes (see [41,28,13,29,30]) or PDE (partial differential equations) solvers. For such applications, choosing an appropriate step size for approximating the derivatives by finite differences is quite delicate: function evaluation is expensive and can be very noisy. For such type of application, finite difference quasi-newton methods need to be avoided. Indeed, even if actual derivative information were available, quasi-Newton methods might be a poor choice because adversely affected by function inaccuracies (see [17]). Instead, direct optimization methods [16] are relatively insensitive to the noise. Unfortunately, they usually require a great amount of function evaluations.

UOBYQA and CONDOR sample the search space, making evaluations in a way that reduces the influence of the noise. They both construct a full quadratical model based on Lagrange Interpolation technique [14,37,8,42,43,33]. The curvature information is obtained from the quadratical model. This technique is less sensitive to the noise and leads to high quality local quadratical models which directly guide the search to the nearest local optimum. These quadratical models are built using the least number of evaluations (possibly reusing old evaluations).

DFO [12,11] is an algorithm by A.R.Conn, K. Scheinberg and Ph.L. Toint. It's very similar to UOBYQA and CONDOR. It has been specially designed for small dimensional problems and high-computing-load objective functions. In other words, it has been designed for the same kind of problems that CONDOR. DFO also uses a model build by interpolation. It is using a Newton polynomial instead of a Lagrange polynomial. When the DFO algorithm starts, it builds a linear model (using only

evaluations of the objective function;

is the dimension of the search space) and then directly uses this simple model to guide the research into the space. In DFO, when a point is "too far" from the current position, the model could be invalid and could not represent correctly the local shape of the objective function. This "far point" is rejected and replaced by a closer point. This operation unfortunately requires an evaluation of the objective function. Thus, in some situation, it is preferable to lower the degree of the polynomial which is used as local model (and drop the "far" point), to avoid this evaluation. Therefore, DFO is using a polynomial of degree oscillating between 1 and a "full" 2. In UOBYQA and CONDOR, we use the Moré and Sorenson algorithm [26,9] for the computation of the trust region step. It is very stable numerically and give very high precision results. On the other hand, DFO uses a general purpose tool (NPSOL [20]) which gives high quality results but that cannot be compared to the Moré and Sorenson algorithm when precision is critical. An other critical difference between DFO and CONDOR/UOBYQA is the formula used to update the local model. In DFO, the quadratical model built at each iteration is not defined uniquely. For a unique quadratical model in

variables one needs at least $\frac{1}{2}(n+1)(n+2)=N$ points and their function values. "In DFO, models are often build using many fewer points and such models are not uniquely defined" (citation from [11]). The strategy used inside DFO is to select the model with the smallest Frobenius norm of the Hessian matrix. This update is highly numerically instable [36]. Some recent research at this subject have maybe found a solution [36] but this is still "work in progress". The model DFO is using can thus be very inaccurate.

In contrast to UOBYQA and CONDOR, DFO uses linear or quadratical models to guide the search, thus requiring less function evaluations to build the local models. Based on our experimental results, we surprisingly discovered that CONDOR used less function evaluations than DFO to reach an optimum point, despite the fact that the cost to build a local model is higher (see section 5 presenting numerical results). This is most certainly due to an heuristic (see section 5.1 at this subject) used inside UOBYQA and CONDOR which allows to build quadratical models at very "low price".

The algorithm used inside UOBYQA is thus a good choice to reduce the number of function evaluations in the presence of noisy and high computing load objective functions. Since description of this algorithm in the literature is hard to find and rather unclear, a first objective of the paper is to provide an updated and more accessible version of it.

When concerned with CPU time to reach the local optimum, computer parallelization of the function evaluations is always an interesting road to pursue. Indeed, PDS (Parallel direct search) largely exploits this parallelization to reduce the optimization time. We take a similar road by proposing an extension of the original UOBYQA that can use several CPU's in parallel: CONDOR. Our experimental results show that this addition makes CONDOR the fastest available algorithm for noisy, high computing load objective functions (fastest in terms of number of function evaluations).

In substance, this paper proposes a new, simpler and clearer, parallel implementation in C++ of UOBYQA: the CONDOR optimizer. A version of CONDOR allowing constraints is discussed in [40]. The paper is structured in the following way:

Section 1: The introduction.
Section 2: Basic description of the UOBYQA algorithm with hints to possible parallelization.
Section 3: New, more in depth, comprehensible presentation of UOBYQA with a more precise description of the parallel extension.
Section 4: In depth description of this parallel extension.
Section 5: Experimental results: comparison between CONDOR, the original Powell's UOBYQA, DFO, LANCELOT, COBYLA, PDS.
Section 5: How to get the code and conclusions.

Basic description of Powell's UOBYQA algorithm

Let

be the dimension of the search space. Let

be the objective function to minimize. We want to find $x^* \in \Re^n$ which satisfies:

$\displaystyle f(x^*)= \min_x f(x)$

(1)

In the following algorithm, $\rho$ is the usual trust region radius. We do not allow $\rho$ to increase because this would necessitate expensive decrease later. We will introduce $\Delta$ , another trust region radius that satisfies $\Delta \geq \rho$ . The advantage of $\Delta$ is to allow the length of the steps to exceed $\rho$ and to increase the efficiency of the algorithm.
Let $x_{start}$ be the starting point of the algorithm. Let $\rho_{start}$ and $\rho_{end}$ be the initial and final value of the trust region radius $\rho$ .

$\textstyle \parbox{11.5cm}{{\bf \underline{Definition:}} The local approximati... ...ert \leq \rho $\ where $\kappa$\ is a given constant independent of $x$. \\ }$

Basically, Powell's UOBYQA algorithm does the following (for a more detailed explanation, see section 3 or [34]):

Create an interpolation polynomial of degree 2 which interpolates the objective function around $x_{start}$ . All the points in the interpolation set $\mbox{$Y$}$ (used to build ) are separated by a distance of approximatively $\rho_{start}$ . Set the best point of the objective function known so far. Set $\rho_0=\rho_{start}$ . In the following algorithm, is the quadratical approximation of around : where is an approximation of the gradient of evaluated at and is an approximation of the Hessian matrix of evaluated at .
Set $\Delta_k=\rho_k$
Inner loop: solve the problem for a given precision of .
1. 1. Solve $\displaystyle s_k=\min_{s \in \Re^n} q_k(s)$ subject to $\Vert s\Vert _2 < \Delta_k$ .
  2. If $\Vert s_k \Vert< \frac{1}{2}\rho_k$ , then break and go to step 3(b) because, in order to do such a small step, we need to be sure that the model is valid.
  3. Evaluate the function at the new position . Update (like described in next section, 4(a)viii. to 4(a)x. ) the trust region radius $\Delta_k$ and the current best point using classical trust region technique. Include the new inside the interpolation set $\mbox{$Y$}$ . Update to interpolate on the new $\mbox{$Y$}$ .
  4. If some progress has been achieved (for example, $\Vert s_k \Vert>2 \rho$ or there was a reduction $f(x_{k+1})<f(x_k)$ ), increment and go back to step 3(a)i, otherwise continue.
2. Test the validity of in , like described in [34].
  - Model is invalid:
    Improve the quality of the model : Remove the worst point of the interpolation set $\mbox{$Y$}$ and replace it (one evaluation required!) with a new point $x_{new}$ such that: $\Vert x_{new} - x_k \Vert < \rho$ and the precision of is substantially increased.
  - Model is valid:
    If $\Vert s_k \Vert>\rho_k$ go back to step 3(a), otherwise continue.
Reduce $\rho$ since the optimization steps are becoming very small, the accuracy needs to be raised.
If $\rho=\rho_{end}$ stop, otherwise increment k and go back to step 2.

Basically, $\rho$ is the distance (Euclidian distance) which separates the points where the function is sampled. When the iterations are unsuccessful, the trust region radius $\Delta_k$ decreases, preventing the algorithm to achieve more progress. At this point, loop 3(a)i to 3(a)iv is exited and a function evaluation is required to increase the quality of the model (step 3(b)). When the algorithm comes close to an optimum, the step size becomes small. Thus, the inner loop (steps 3(a)i. to 3(a)iv.) is usually exited from step 3(a)ii, allowing to skip step 3(b) (hoping the model is valid), and directly reducing $\rho$ in step 4.

The most inner loop (steps 3(a)i. to 3(a)iv.) tries to get from

good search directions without doing any extra evaluation to maintain the quality of

(The evaluations that are performed on step 3(a)i) have another goal). Only inside step 3(b), evaluations are performed to increase this quality (called a "model step") and only at the condition that the model has been proven to be invalid (to spare evaluations!).

Notice the update mechanism of $\rho$ in step 4. This update occurs only when the model has been validated in the trust region $\mbox{$B$}$ $_k(\rho)$ (when the loop 3(a) to 3(b) is exited). The function cannot be sampled at point too close to the current point

without being assured that the model is valid in $\mbox{$B$}$ $_k(\rho)$ . This mechanism protects us against noise.

The different evaluations of

are used to:

(a): guide the search to the minimum of (see inner loop in the steps 3(a)i. to 3(a)iv.). To guide the search, the information gathered until now and available in is exploited.
(b): increase the quality of the approximator (see step 3(b)). To avoid the degeneration of , the search space needs to be additionally explored.

(a) and (b) are antagonist objectives like frequently encountered in the exploitation/exploration paradigm. The main idea of the parallelization of the algorithm is to perform the exploration on distributed CPU's. Consequently, the algorithm will have better models

of

available and choose better search direction, leading to a faster convergence.

UOBYQA and CONDOR are inside the class of algorithm which are proven to be globally convergent to a local (maybe global) optimum: They are both using conditional models as described in [12,8].

The UOBYQA algorithm in depth

We will now detail the UOBYQA algorithm [34] and a part of its parallel extension. As a result of this parallel extension, the points 3, 4(a)i, 4(b), 9 constitute an original contribution of the authors. When only one CPU is available, these points are simply skipped. The point 4(a)v is also original and has been added to make the algorithm more robust against noise in the evaluation of the objective function. These points will be detailed in the next section. The other points of the algorithm belong to the original UOBYQA.
Let

and

, be the absolute and relative error on the evaluation of the objective function. These constants are given by the user. By default, they are null.

Set $\Delta= \rho$ , $\rho=\rho_{start}$ and generate a first interpolation set $\mbox{$Y$}$ $= \{ \boldsymbol{x}_{(1)}, \ldots, \boldsymbol{x}_{(N)} \}$ around $x_{start}$ (with ). This set is "poised", meaning that the Vandermonde determinant of $\mbox{$Y$}$ is non-null (see [14,37]). The set $\mbox{$Y$}$ is generated using the algorithm described in [34].
In what follows, the index is always the index of the best point of the set $\mbox{$Y$}$ $= \{ \boldsymbol{x}_{(1)}, \ldots, \boldsymbol{x}_{(N)} \}$ . The points in $\mbox{$Y$}$ will be noted in bold with parenthesis around their subscript. Let $x_{(base)} := \boldsymbol{x}_{(k)}$ . Set $F_{old}:=f(x_{(base)})$ . Apply a translation of $-x_{(base)}$ to all the dataset $\{ \boldsymbol{x}_{(1)}, \ldots, \boldsymbol{x}_{(N)} \}$ and generate the quadratical polynomial , which intercepts all the points in the dataset $\mbox{$Y$}$ . The translation is achieved to increase the quality of the interpolation. is built using Multivariate Lagrange Interpolation. It means that $q_k(s)= \sum_{i=1}^N f(\boldsymbol{x}_{(i)}) P_i(s)$ where the are the Lagrange polynomials associated to the dataset $\mbox{$Y$}$ . The have the following property: $P_i( \boldsymbol{x}_{(j)} )= \delta_{(i,j)}$ where $\delta_{(i,j)}$ is the Kronecker delta (see [14,37] about multivariate Lagrange polynomial interpolation). The complete procedure is given in [34].
Parallel extension: Start the "parallel computations" on the different computer nodes. See next section for more details.
1. 1. Parallel extension: Check the results of the parallel computation and use them to increase the quality of . See next section for more details.
  2. Calculate the "Trust region step" : is the solution of:
    
    $\displaystyle \min_{s \in \Re^n} q(\boldsymbol{x}_{(k)}+s) = \min_{s \in \Re^n} q_k(s) \; \text{ subject to } \; \Vert s \Vert _2 < \Delta$
    
    This is a quadratic program with a non-linear constraint. It's solved using Moré and Sorenson algorithm (see [26,9]). The original implementation of the UOBYQA algorithm uses a special tri-diagonal decomposition of the Hessian to obtain high speed (see [32]). CONDOR uses a direct, simpler, implementation of the Moré and Sorenson algorithm.
  3. If $\displaystyle \Vert s \Vert< \frac{\rho}{2}$ , then break and go to step 4(b): the model needs to be validated before doing such a small step.
  4. Let $R:=q(\boldsymbol{x}_{(k)})-q(\boldsymbol{x}_{(k)}+s^*) \geq 0$ , the predicted reduction of the objective function.
  5. One original addition to the algorithm is the following:
    Let $noise: = \frac{1}{2} \max [ noise_a*(1+noise_r), noise_r \vert f(\boldsymbol{x}_{(k)})\vert ]$ .
    If , break and go to step 4(b).
  6. Evaluate the objective function at point $x= x_{(base)}+\boldsymbol{x}_{(k)}+s^*$ . The result of this evaluation is stored in the variable $F_{new}$ .
  7. Compute the agreement between and the model :
    
    $\displaystyle r=\frac{F_{old}-F_{new}}{R}$
  8. Update the trust region radius $\Delta$ :
    
    $\begin{displaymath} \begin{cases}\max[\Delta, \frac{5}{4} \Vert s\Vert, \rho+\V... ...\frac{1}{2} \Vert s\Vert & \text{ if } r \leq 0.1 \end{cases} \end{displaymath}$
    
    If $\displaystyle (\Delta < 1.5 \rho)$ , set $\Delta:=\rho$ .
  9. Store $\boldsymbol{x}_{(k)}+s^*$ inside the interpolation dataset $\mbox{$Y$}$ . To do so, first, choose the worst point $\boldsymbol{x}_{(t)}$ of the dataset (The exact, detailed algorithm, is given in [34]). This is the point which gives the highest contribution to following bound on the interpolation error [33]:
    
    $\begin{displaymath}\begin{array}{c} \text{\em Interpolation } \\ \text{\em erro... ..._{j=1}^N \vert P_j(y)\vert \Vert y-\boldsymbol{x}_{(j)}\Vert^3\end{displaymath}$ (2)
    
    Where is a bound on the third derivative of : $\vert \phi'''(\alpha) \vert \leq M$ where $\phi(\alpha)= f(y+\alpha \bar{d}), \;\; \alpha \in \Re, \bar{d} \in \Re^n$ and $\Vert \bar{d} \Vert =1$ , and where are the Lagrange Polynomials used to construct .(see [14,37] about multivariate Lagrange polynomial interpolation).
    Secondly, replace the point $\boldsymbol{x}_{(t)}$ by $\boldsymbol{x}_{(k)}+s^*$ and recalculate the new quadratic which interpolates the new dataset.
    $\;$ The is $\Vert \boldsymbol{x}_{(t)} - (\boldsymbol{x}_{(k)}+s^*) \Vert$
  10. Update the index of the best point in the dataset.
    Set $F_{new} := \min [ F_{old}, F_{new} ]$ .
  11. Update the value (the bound on the third derivative of ) using:
    
    $\displaystyle M_{new}= \max \bigg[ M_{old}, \frac{\vert q_k(x)-f(x) \vert }{ \... ...sum_{j=1}^N \vert P_j(x) \vert \Vert x - \boldsymbol{x}_{(j)} \Vert^3 } \bigg]$ (3)
  12. If there is an improvement in the quality of the solution ( $F_{new}<F_{old}$ ) OR if $(\Vert s^* \Vert > 2 \rho)$ OR if $ModelStep > 2 \rho$ then go back to point 4(a)i, otherwise, continue.
2. Parallel extension: Check the results of the parallel computation and use them to increase the quality of . See next section for more details.
3. The validity of our model in , a ball of radius around now needs to be checked based on equations (6) and (7).
  - Model is invalid:
    Improve the quality of our model . This is called a "model improvement step". Remove the worst point $\boldsymbol{x}_{(j)}$ of the dataset and replace it by a better point. This better point is computed using an algorithm described in [34]. If a new function evaluation has been made, the value of must also be updated. Possibly, an update of the index of the best point in the dataset $\mbox{$Y$}$ and $F_{old}$ is required. Once this is finished, go back to step 4(a).
  - Model is valid:
    If $\Vert s^* \Vert>\rho$ go back to step 4(a), otherwise continue.
If $\rho=\rho_{end}$ , the algorithm is nearly finished. Go to step 8, otherwise continue to the next step.
Update of trust region radius $\rho$ .

$\displaystyle \rho_{new} = \begin{cases}\rho_{end} & \text{ if } \rho_{end} < ... ...eq 250 \rho_{end} \\ 0.1 \rho & \text{ if } 250 \rho_{end} < \rho \end{cases}$ (4)

Set $\displaystyle \Delta:=\max[\frac{\rho}{2}, \rho_{new}]$ . Set $\rho:=\rho_{new}$ .
Set $x_{(base)} := x_{(base)} + \boldsymbol{x}_{(k)}$ . Apply a translation of $- \boldsymbol{x}_{(k)}$ to , to the set of Newton polynomials which defines and to the whole dataset $\mbox{$Y$}$ $= \{ \boldsymbol{x}_{(1)}, \ldots, \boldsymbol{x}_{(N)} \}$ . Go back to step 4.
The iterations are now complete but one more value of may be required before termination. Indeed, it is known from step 4(a)iii and step 4(a)v of the algorithm that the value of could not have been computed. Compute .
- if $F_{new}<F_{old}$ , the solution of the optimization problem is $x_{(base)}+\boldsymbol{x}_{(k)}+s^*$ and the value of at this point is $F_{new}$ .
- if $F_{new}> F_{old}$ , the solution of the optimization problem is $x_{(base)}+\boldsymbol{x}_{(k)}$ and the value of at this point is $F_{old}$ .
Parallel extension: Stop the parallel computations if necessary.

The aim of the parallelization is to evaluate

at positions which could substantially increase the quality of the approximator

. The way to choose such positions is explained in section 4.

The parallel extension of UOBYQA

We will use a client-server approach. The main node, the server, will run two concurrent processes:

The main process on the main computer is the classical non-parallelized version of the algorithm, described in the previous section. There is an exchange of information with the second/parallel process on steps 4(a)i and 4(b) of the original algorithm.
The goal of the second/parallel process on the main computer is to increase the quality of the model by using client computers to sample at specific interpolation sites.

In an ideal scenario:

The main process will always stay inside the most inner loop 4(a)i to 4(a)xii. Hoping that the evaluation on the client computers always provide a valid local model , progress will constantly be achieved.
The main process exits the inner loop at step 4(a)iii: Near an optimum, the model is ideally valid and $\rho$ can be decreased.

The client nodes are performing the following:

Wait to receive from the second/parallel process on the server a sampling site (a point).
Evaluate the objective function at this site and return immediately the result to the server.
Go to step 1.

Several strategies have been tried to select good sampling sites. We describe here the most promising one. The second/parallel task is the following:

A.: Make a local copy $q_{(copy)}(s)$ of (and of the associated Lagrange Polynomials $P_{j}(x)$ )
B.: Make a local copy $\mbox{$J$}$ $_{(copy)}$ of the dataset $\mbox{$J$}$ $= \{ \boldsymbol{x}_{(1)}, \ldots, \boldsymbol{x}_{(N)} \}$ .
C.: Find the index of the point inside $\mbox{$J$}$ $_{(copy)}$ the further away from $\boldsymbol{x}_{(k)}$ .
D.: Replace $\boldsymbol{x}_{(j)}$ by a better point $\boldsymbol{x}_{(j)}+d$ which will increase the quality of the approximation of . The computation of this point is detailed below.
E.: Ask for an evaluation of the objective function at point $\boldsymbol{x}_{(j)}+d$ using a free client computer to perform the evaluation. If there is still a client idle, go back to step C.
F.: Wait for a node to finish its evaluation of the objective function . Most of the time, the second/parallel task will be blocked here without consuming any resources.
G.: Update $q_{(copy)}(x)$ using the newly received evaluation. Update $\mbox{$J$}$ $_{(copy)}$ . go to step C.

In the parallel/second process we are always working on a copy of

, $\mbox{$J$}$ and $P_{j,(copy)}(x)$ to avoid any side effect with the main process which is guiding the search. The communication and exchange of information between these two processes are done only at steps 4(a)i and 4(b) of the main process described in the previous section. Each time the main process checks the results of the parallel computations the following is done:

i.: Wait for the parallel/second task to enter the step F described above and block the parallel task inside this step F for the time needed to perform the points ii and iii below.
ii.: Update of using all the points calculated in parallel, discarding the points that are too far away from $\boldsymbol{x}_{(k)}$ (at a distance greater than $\rho$ )(The points are inside $\mbox{$J$}$ $_{(copy)}$ ). This update is performed using technique described in [34]. We will possibly have to update the index of the best point in the dataset $\mbox{$J$}$ and $F_{old}$ .
iii.: Perform operations described in point A & B of the parallel/second task algorithm above: "Copy $q_{(copy)}(x)$ from .
Copy $\mbox{$J$}$ $_{(copy)}$ from $\mbox{$J$}$ $= \{ \boldsymbol{x}_{(1)}, \ldots, \boldsymbol{x}_{(N)} \}$ ".

In step D. of the parallel algorithm, we must find a point which increase substantially the quality of the local approximation $q_{(copy)}(x)$ of

. In the following, the discovery of this better point is explained. The equation (2) is used. We will restate it here for clarity:

$\begin{displaymath}\begin{array}{c} \text{\em Interpolation Error} \\ \text{\... ..._{j,(copy)}(y)\vert \Vert y-\boldsymbol{x}_{(j),(copy)}\Vert^3\end{displaymath}$

where

and $P_{j,(copy)}$ have the same signification as for equation (2). Note also that we are working on a copy of

$\mbox{$J$}$ and

. In the remaining of the current section, we will drop the $_{(copy)}$ subscript for easier notation.
This equation has a special structure. The contribution to the interpolation error of the point $\boldsymbol{x}_{(j)}$ to be dropped is easily separable from the contribution of the other points of the dataset $\mbox{$J$}$ , it is:

error due to $\displaystyle \boldsymbol{x}_{(j)}= \frac{1}{6} M \vert P_j(y) \vert \Vert y - \boldsymbol{x}_{(j)} \Vert^3$

(5)

If

is inside the ball of radius $\rho$ around

(

is the best point found until now in the second/parallel task), then an upper bound of equation (5) can be found:

$\displaystyle \frac{1}{6}M \max_{y}$	$\displaystyle \{ \vert P_{j}(y)\vert \Vert y- x_k \Vert^3 : \Vert y- x_k \Vert \leq \rho \}$
	$\displaystyle \approx \frac{1}{6}M \Vert \boldsymbol{x}_{(j)}- x_k \Vert^3 \max_{d} \{ \vert P_{j}(x_k+d)\vert : \Vert d \Vert \leq \rho \}$

We are ignoring the dependence of the other Newton polynomials in the hope of finding a useful technique and cheap to implement. $\boldsymbol{x}_{(j)}$ is thus replaced in ${\mbox{$J$}_{(copy)}}$ by

where

is the solution of the following problem:

$\displaystyle \max_{d} \{ \vert P_{j}(x_k+d)\vert : \Vert d \Vert \leq \rho \}$

The algorithm used to solve this problem is described in [34].

Numerical Results

Results on one CPU

We will now compare CONDOR with UOBYQA [34], DFO [12,11], PDS [16], LANCELOT [7] and COBYLA [31] on a part of the Hock and Schittkowski test set [23]. The test functions and the starting points are extracted from SIF files obtained from CUTEr, a standard test problem database for non-linear optimization (see [22]). We are thus in perfect standard conditions. The tests problems are arbitrary and have been chosen by A.R.Conn, K. Scheinberg and Ph.L. Toint. to test their DFO algorithm. The performances of DFO are thus expected to be, at least, good. We list the number of function evaluations that each algorithm took to solve the problem. We also list the final function values that each algorithm achieved. We do not list the CPU time, since it is not relevant in our context. The "*" indicates that an algorithm terminated early because the limit on the number of iterations was reached. The default values for all the parameters of each algorithm is used. The stopping tolerance of DFO was set to $10^{-4}$ , for the other algorithms the tolerance was set to appropriate comparable default values. The comparison between the algorithms is based on the number of function evaluations needed to reach the SAME precision. For the most fair comparison with DFO, the stopping criteria ( $\rho_{end}$ ) of CONDOR has been chosen so that CONDOR is always stopping with a little more precision on the result than DFO. This precision is some time insufficient to reach the true optima of the objective function. In particular, in the case of the problems GROWTHLS and HEART6LS, the CONDOR algorithm can find a better optimum after some more evaluations (for a smaller $\rho_{end}$ ). All algorithms were implemented in Fortran 77 in double precision except COBYLA which is implemented in Fortran 77 in single precision and CONDOR which is written in C++ (in double precision). The trust region minimization subproblem of the DFO algorithm is solved by NPSOL [20], a fortran 77 non-linear optimization package that uses an SQP approach. For CONDOR, the number in parenthesis indicates the number of function evaluation needed to reach the optimum without being assured that the value found is the real optimum of the function. For example, for the WATSON problem, we find the optimum after (580) evaluations. CONDOR still continues to sample the objective function, searching for a better point. It's loosing 87 evaluations in this search. The total number of evaluation (reported in the first column) is thus 580+87=667.

$\begin{sidewaysfigure} % latex2html id marker 381 \hspace{-2.7cm} \begin{tabu... ...ONDOR, UOBYQA, DFO, PDS, LANCELOT and COBYLA on one CPU.} \end{sidewaysfigure}$
CONDOR and UOBYQA are both based on the same algorithm and have nearly the same behavior.

PDS stands for "Parallel Direct Search" [16]. The number of function evaluations is high and so the method doesn't seem to be very attractive. On the other hand, these evaluations can be performed on several CPU's reducing considerably the computation time.

Lancelot [7] is a code for large scale optimization when the number of variable is

and the objective function is easy to evaluate (less than

.). Its model is build using finite differences and BFGS update. This algorithm has not been design for the kind of application we are interested in and is thus performing accordingly.

COBYLA [31] stands for "Constrained Optimization by Linear Approximation" by Powell. It is, once again, a code designed for large scale optimization. It is a derivative free method, which uses linear polynomial interpolation of the objective function.

DFO [12,11] is an algorithm by A.R.Conn, K. Scheinberg and Ph.L. Toint. It has already been described in section 1. In CONDOR and in UOBYQA the validity of the model is checked using two equations:

$\begin{displaymath}\begin{array}{c} \text{\em All the interpolation } \\ \tex... ...\boldsymbol{x}_{(k)}\Vert \leq 2 \rho\;\;\;\;\;\; j=1,\ldots,N\end{displaymath}$

(6)

$\begin{displaymath}\begin{array}{c}\text{\em Powell's} \\ \text{\em heuristic} ... ...\Vert d \Vert \leq \rho \} \leq \epsilon \;\;\;\; j=1,\ldots,N\end{displaymath}$

(7)

using notation of section 3. See [34] to know how to compute $\epsilon$ . The first equation (6) is also used in DFO. The second equation (7) (which is similar to equation (2)) is NOT used in DFO. This last equation allows us to "keep far points" inside the model, still being assured that it is valid. It allows us to have a "full" polynomial of second degree for a "cheap price". The DFO algorithm cannot use equation 7 to check the validity of its model because the variable $\epsilon$ (which is computed in UOBYQA and in CONDOR as a by-product of the computation of the "Moré and Sorenson Trust Region Step") is not cheaply available. In DFO, the trust region step is calculated using an external tool: NPSOL [20]. $\epsilon$ is difficult to obtain and is not used.

UOBYQA and CONDOR are always using a full quadratic model. This enables us to compute Newton's steps. The Newton's steps have a proven quadratical convergence speed [15]. Unfortunately, some evaluations of the objective function are lost to build the quadratical model. So, we only obtain *near* quadratic speed of convergence. We have Q-superlinear convergence (see original paper of Powell [34]). (In fact the convergence speed is often directly proportional to the quality of the approximation

of the real Hessian matrix of

). Usually, the price (in terms of number of function evaluations) to construct a good quadratical model is very high but using equation (7), UOBYQA and CONDOR are able to use very few function evaluations to update the local quadratical model.

When the dimension of the search space is greater than 25, the time needed to start, building the first quadratic, is so important (

evaluations) that DFO may becomes attractive again. Especially, if you don't want the optimum of the function but only a small improvement in a small time. If several CPU's are available, then CONDOR once again imposes itself. The function evaluations needed to build the first quadratic are parallelized on all the CPU's without any loss of efficiency when the number of CPU increases (the maximum number of CPU is

). This first construction phase has a great parallel efficiency, as opposed to the rest of the optimization algorithm where the efficiency becomes soon very low (with the number of CPU increasing). In contrast to CONDOR, the DFO algorithm has a very short initialization phase and a long research phase. This last phase can't be parallelized very well. Thus, when the number of CPU's is high, the most promising algorithm for parallelization is CONDOR. A parallel version of CONDOR has been implemented. Very encouraging experimental results on the parallel code are given in the next section.

When the local model is not convex, no second order convergence proof (see [10]) is available. It means that, when using a linear model, the optimization process can prematurely stop. This phenomenon *can* occur with DFO which uses from time to time a simple linear model. CONDOR is very robust and always converges to a local optimum (extensive numerical tests have been made [40]).

Parallel results

We are using the same test conditions as for the previous section (standard objective functions with standard starting points).

Since the objective function is assumed to be time-expensive to evaluate, we can neglect the time spent inside the optimizer and inside the network transmissions. To be able to make this last assumption (negligible network transmissions times), a wait loop of 1 second is embedded inside the code used to evaluate the objective function (only 1 second: to be in the worst case possible).

**Figure 2:** Improvement due to parallelism
$\begin{figure} \hspace{-.8cm} \begin{tabular}{\vert l\vert c@{\hspace{-0.4cm}}... ...947 & \multicolumn{3}{\vert c}{} \\ \cline{1-5} \end{tabular} \end{figure}$

Table 2 indicates the number of function evaluations performed on the master CPU (to obtain approximatively the total number of function evaluations cumulated over the master and all the slaves, multiply the given number on the list by the number of CPU's). The CPU time is thus directly proportional to the numbers listed in columns 3 to 5 of the table 2.

Suppose a function evaluation takes 1 hour. The parallel/second process on the main computer has asked 59 minutes ago to a client to perform one such evaluation. We are at step 4(a)i of the main algorithm. We see that there are no new evaluation available from the client computers. Should we go directly to step 4(a)ii and use later this new information, or wait 1 minute? The response is clear: wait a little. This bad situation occurs very often in our test examples since every function evaluation takes exactly the same time (1 second). But what's the best strategy when the objective function is computing, randomly, from 40 to 80 minutes at each evaluation (this is for instance the case for objective functions which are calculated using CFD techniques)? The response is still to investigate. Currently, the implemented strategy is: never wait. Despite, this simple strategy, the current algorithm gives already some non-negligible improvements.

Noisy optimization

We will assume that objective functions derived from CFD codes have usually a simple shape but are subject to high-frequency, low amplitude noise. This noise prevents us to use simple finite-differences gradient-based algorithms. Finite-difference is highly sensitive to the noise. Simple Finite-difference quasi-Newton algorithms behave so badly because of the noise, that most researchers choose to use optimization techniques based on GA,NN,... [41,13,29,30]. The poor performances of finite-differences gradient-based algorithms are either due to the difficulty in choosing finite-difference step sizes for such a rough function, or the often cited tendency of derivative-based methods to converge to a local optimum [4]. Gradient-based algorithms can still be applied but a clever way to retrieve the derivative information must be used. One such algorithm is DIRECT [21,24,5] which is using a technique called implicit filtering. This algorithm makes the same assumption about the noise (low amplitude, high frequency) and has been successful in many cases [5,6,38]. For example, this optimizer has been used to optimize the cost of fuel and/or electric power for the compressor stations in a gas pipeline network [6]. This is a two-design-variables optimization problem. You can see in the right of figure 5 a plot of the objective function. Notice the simple shape of the objective function and the small amplitude, high frequency noise. Another family of optimizers is based on interpolation techniques. DFO, UOBYQA and CONDOR belongs to this last family. DFO has been used to optimize (minimize) a measure of the vibration of a helicopter rotor blade [4]. This problem is part of the Boeing problems set [3]. The blade are characterized by 31 design variables. CONDOR will soon be used in industry on a daily basis to optimize the shape of the blade of a centrifugal impeller [28]. All these problems (gas pipeline, rotor blade and impeller blade) have an objective function based on CFD code and are both solved using gradient-based techniques. In particular, on the rotor blade design, a comparative study between DFO and other approaches like GA, NN,... has demonstrated the clear superiority of gradient-based techniques approach combined with interpolation techniques [4].

We will now illustrate the performances of CONDOR in two simple cases which have sensibly the same characteristics as the objective functions encountered in optimization based on CFD codes. The functions, the amplitude of the artificial noise applied to the objective functions (uniform noise distribution) and all the parameters of the tests are summarized in table 5.3. In this table "NFE" stands for Number of Function Evaluations. Each columns represents 50 runs of the optimizer.

**Figure 3:** Noisy optimization.
$\begin{figure} \hspace{-1cm}\begin{tabular}{\vert l\vert c\vert c\vert c\vert c... ...\footnotesize 1e-2 & \footnotesize 1e-1 \\ \hline \end{tabular} \end{figure}$

**Figure 4:** On the left: A typical run for the optimization of the noisy Rosenbrock function. On the right:Four typical runs for the optimization of the simple noisy quadratic (noise=1e-4).
$\begin{figure} \centering\epsfig{figure=figures/noise1.eps, width=6cm, height=3.7cm}\epsfig{figure=figures/noise2.eps, width=6cm, height=3.7cm} \end{figure}$

**Figure 5:** Typical shape of objective function derived from CFD analysis.
$\begin{figure}\centering \epsfig{figure=figures/noise3.eps,width=5cm, height=4... ...m]{ \epsfig{figure=figures/noise4b.eps,width=6cm, height=4cm}} \end{figure}$

A typical run for the optimization of the noisy Rosenbrock function is given in the left of figure 4. Four typical runs for the optimization of the simple noisy quadratic in four dimension are given in the right of figure 4. The noise on this four runs has an amplitude of 1e-4. In these conditions, CONDOR stops in average after 100 evaluations of the objective function but we can see in figure 4 that we usually already have found a quasi-optimum solution after only 45 evaluations.

As expected, there is a clear relationship between the noise applied on the objective function and the average best value found by the optimizer. From the table 5.3 we can see the following: When you have a noise of $10^{n+2}$ , the difference between the best value of the objective function found by the optimizer AND the real value of the objective function at the optimum is around $10^{n}$ . In other words, in our case, if you apply a noise of $10^{-2}$ , you will get a final value of the objective function around $10^{-4}$ . Obviously, this strange result only holds for this simple objective function (the simple quadratic) and these particular testing conditions. Nevertheless, the robustness against noise is impressive.

If this result can be generalized, it will have a great impact in the field of CFD shape optimization. This simply means that if you want a gain of magnitude $10^{n}$ in the value of the objective function, you have to compute your objective function with a precision of at least $10^{n+2}$ . This gives you an estimate of the precision at which you have to calculate your objective function. Usually, the more precision, the longer the evaluations are running. We are always tempted to lower the precision to gain in time. If this strange result can be generalized, we will be able to adjust tightly the precision and we will thus gain a precious time.

Conclusions

Given the search space comprised between 2 and 20 and given some noise of small amplitude and high frequency on the objective function evaluation, among the best optimizer available are UOBYQA and its parallel, constrained extension CONDOR. When several CPU's are used, the experimental results tend to show that CONDOR becomes the fastest optimizer in its category(fastest in terms of number of function evaluations).

Some improvements are still possible:

Add the possibility to start with a linear model, using a stable update inspired by [36].
Use a better strategy for the parallel case (see end of section 5.2)
Currently the trust region is a simple ball (this is linked to the L2-norm $\Vert s \Vert _2$ used in step 4(a)ii of the algorithm). It would be interesting to have a trust region which reflects the underlying geometry of the model and not give undeserved weight to certain directions (for example, using a H-norm) (see [9]). This improvement will have a small effect provided the variables have already been correctly normalized.

Some research can also be made in the field of kriging models (see [4]). These models need very few "model improvement steps" to obtain a good validity. The validity of the approximation can also easily be checked.

The code of the optimizer is a complete C/C++ stand-alone package written in pure structural programmation style. There is no call to fortran, external, unavailable, copyrighted, expensive libraries. You can compile it under UNIX or Windows. The only library needed is the standard TCP/IP network transmission library based on sockets (only in the case of the parallel version of the code) which is available on almost every platform. You don't have to install any special library such as MPI or PVM to build the executables. The client on different platforms/OS'es can be mixed together to deliver a huge computing power. The full description of the algorithm code can be found in [39].

The code has been highly optimized (with extended use of memcpy function, special fast matrix manipulation, fast pointer arithmetics, and so on...). However, BLAS libraries [25] have not been used to allow a full Object-Oriented approach. Anyway, the dimension of the problems is rather low so BLAS is nearly un-useful. OO style programming allows a better comprehension of the code for the possible reader.

A small C++ SIF-file reader has also been implemented (to be able to use the problems coded in SIF from the CUTEr database, [33]). An AMPL interface [19] has also been implemented.

The fully stand-alone code is currently available at the homepage of the first author: http://www.applied-mathematics.net.

Bibliography

1: Aemdesign.
URL: http://www.aemdesign.com/FSQPapplref.htm.
2: Paul T. Boggs and Jon W. Tolle.
Sequential Quadratic Programming.
Acta Numerica, pages 1-000, 1996.
3: Andrew J. Booker, A.R. Conn, J.E. Dennis Jr., Paul D. Frank, Michael Trosset, Virginia Torczon, and Michael W. Trosset.
Global modeling for optimization: Boeing/ibm/rice collaborative project 1995 final report.
Technical Report ISSTECH-95-032, Boeing Information Support Services, Research and technology, Box 3707, M/S 7L-68, Seattle, Washington 98124, December 1995.
4: Andrew J. Booker, J.E. Dennis Jr., Paul D. Frank, David B. Serafini, Virginia Torczon, and Michael W. Trosset.
Optimization using surrogate objectives on a helicopter test example.
Computational Methods in Optimal Design and Control, pages 49-58, 1998.
5: D. M. Bortz and C. T. Kelley.
The Simplex Gradient and Noisy Optimization Problems.
Technical Report CRSC-TR97-27, North Carolina State University, Department of Mathematics, Center for Research in Scientific Computation Box 8205, Raleigh, N. C. 27695-8205, September 1997.
6: R. G. Carter, J. M. Gablonsky, A. Patrick, C. T. Kelley, and O. J. Eslinger.
Algorithms for Noisy Problems in Gas Transmission Pipeline Optimization.
Optimization and Engineering, 2:139-157, 2001.
7: Andrew R. Conn, Nicholas I.M. Gould, and Philippe L. Toint.
LANCELOT: a Fortran package for large-scale non-linear optimization (Release A).
Springer Verlag, HeidelBerg, Berlin, New York, springer series in computational mathematics edition, 1992.
8: Andrew R. Conn, Nicholas I.M. Gould, and Philippe L. Toint.
Trust-region Methods.
SIAM Society for Industrial & Applied Mathematics, Englewood Cliffs, New Jersey, mps-siam series on optimization edition, 2000.
Chapter 9: conditional model, pp. 307-323.
9: Andrew R. Conn, Nicholas I.M. Gould, and Philippe L. Toint.
Trust-region Methods.
SIAM Society for Industrial & Applied Mathematics, Englewood Cliffs, New Jersey, mps-siam series on optimization edition, 2000.
The ideal Trust Region: pp. 236-237.
10: Andrew R. Conn, Nicholas I.M. Gould, and Philippe L. Toint.
Trust-region Methods.
SIAM Society for Industrial & Applied Mathematics, Englewood Cliffs, New Jersey, mps-siam series on optimization edition, 2000.
Note on convex models, pp. 324-337.
11: Andrew R. Conn, Nicholas I.M. Gould, and Philippe L. Toint.
A Derivative Free Optimization Algorithm in Practice.
Technical report, Department of Mathematics, University of Namur, Belgium, 98.
Report No. 98/11.
12: Andrew R. Conn, K. Scheinberg, and Philippe L. Toint.
Recent progress in unconstrained nonlinear optimization without derivatives.
Mathematical Programming, 79:397-414, 1997.
13: R. Cosentino, Z. Alsalihi, and R. Van Den Braembussche.
Expert System for Radial Impeller Optimisation.
In Fourth European Conference on Turbomachinery, ATI-CST-039/01, Florence,Italy, 2001.
14: Carl De Boor and A. A Ron.
On multivariate polynomial interpolation.
Constr. Approx., 6:287-302, 1990.
15: J.E. Dennis Jr. and Robert B. Schnabel.
Numerical Methods for unconstrained Optimization and nonlinear Equations.
SIAM Society for Industrial & Applied Mathematics, Englewood Cliffs, New Jersey, classics in applied mathematics, 16 edition, 1996.
16: J.E. Dennis Jr. and V. Torczon.
Direct search methods on parallel machines.
SIAM J. Optimization, 1(4):448-474, 1991.
17: J.E. Dennis Jr. and H.F. Welaker.
Inaccurracy in quasi-Newton methods: local improvement theroems.
Mathematical programming Study, 22:70-85, 1984.
18: R. Fletcher.
Practical Methods of optimization.
a Wiley-Interscience publication, Great Britain, 1987.
19: Robert Fourer, David M. Gay, and Brian W. Kernighan.
AMPL: A Modeling Language for Mathematical Programming.
Duxbury Press / Brooks/Cole Publishing Company, 2002.
20: P.E. Gill, W. Murray, M.A. Saunders, and Wright M.H.
Users's guide for npsol (version 4.0): A fortran package for non-linear programming.
Technical report, Department of Operations Research, Stanford University, Stanford, CA94305, USA, 1986.
Report SOL 862.
21: P. Gilmore and C. T. Kelley.
An implicit filtering algorithm for optimization of functions with many local minima.
SIAM Journal of Optimization, 5:269-285, 1995.
22: Nicholas I. M. Gould, Dominique Orban, and Philippe L. Toint.
CUTEr (and SifDec ), a Constrained and Unconstrained Testing Environment, revisited.
Technical report, Cerfacs, 2001.
Report No. TR/PA/01/04.
23: W. Hock and K. Schittkowski.
Test Examples for Nonlinear Programming Codes.
Lecture Notes en Economics and Mathematical Systems, 187, 1981.
24: C. T. Kelley.
Iterative Methods for Optimization, volume 18 of Frontiers in Applied Mathematics.
Philadelphia, 1999.
25: C. L. Lawson, R. J. Hanson, D. Kincaid, and F. T. Krogh.
Basic Linear Algebra Subprograms for FORTRAN usage.
ACM Trans. Math. Soft., 5:308-323, 1979.
26: J.J. Moré and D.C. Sorensen.
Computing a trust region step.
SIAM journal on scientif and statistical Computing, 4(3):553-572, 1983.
27: Eliane R. Panier and André L. Tits.
On combining feasibility, Descent and Superlinear Convergence in Inequality Contrained Optimization.
Mathematical Programming, 59:261-276, 1995.
28: S. Pazzi, F. Martelli, V. Michelassi, Frank Vanden Berghen, and Hugues Bersini.
Intelligent Performance CFD Optimisation of a Centrifugal Impeller.
In Fifth European Conference on Turbomachinery, Prague, CZ, March 2003.
29: Stéphane Pierret and René Van den Braembussche.
Turbomachinery blade design using a Navier-Stokes solver and artificial neural network.
Journal of Turbomachinery, ASME 98-GT-4, 1998.
publication in the transactions of the ASME: '' Journal of Turbomachinery ''.
30: C. Poloni.
Multi Objective Optimisation Examples: Design of a Laminar Airfoil and of a Composite Rectangular Wing.
Genetic Algorithms for Optimisation in Aeronautics and Turbomachinery, 2000.
von Karman Institute for Fluid Dynamics.
31: M.J.D. Powell.
A direct search optimization method that models the objective and constraint functions by linar interpolation.
In Advances in Optimization and Numerical Analysis, Proceedings of the sixth Workshop on Optimization and Numerical Analysis, Oaxaca, Mexico, volume 275, pages 51-67, Dordrecht, NL, 1994. Kluwer Academic Publishers.
32: M.J.D. Powell.
The use of band matrices for second derivative approximations in trust region algorithms.
Technical report, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, England, 1997.
Report No. DAMTP1997/NA12.
33: M.J.D. Powell.
On the Lagrange function of quadratic models that are defined by interpolation.
Technical report, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, England, 2000.
Report No. DAMTP2000/10.
34: M.J.D. Powell.
UOBYQA: Unconstrained Optimization By Quadratic Approximation.
Technical report, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, England, 2000.
Report No. DAMTP2000/14.
35: M.J.D. Powell.
UOBYQA: Unconstrained Optimization By Quadratic Approximation.
Mathematical Programming, B92:555-582, 2002.
36: M.J.D. Powell.
On updating the inverse of a KKT matrix.
Technical report, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, England, 2004.
Report No. DAMTP2004/01.
37: Thomas Sauer and Yuan Xu.
On multivariate lagrange interpolation.
Math. Comp., 64:1147-1170, 1995.
38: D. E. Stoneking, G. L. Bilbro, R. J. Trew, P. Gilmore, and C. T. Kelley.
Yield optimization Using a gaAs Process Simulator Coupled to a Physical Device Model.
IEEE Transactions on Microwave Theory and Techniques, 40:1353-1363, 1992.
39: Frank Vanden Berghen.
Intermediate Report on the development of an optimization code for smooth, continuous objective functions when derivatives are not available.
Technical report, IRIDIA, Université Libre de Bruxelles, Belgium, 2003.
Available at http://www.applied-mathematics.net/optimization/.
40: Frank Vanden Berghen.
Optimization algorithm for Non-Linear, Constrained, Derivative-free optimization of Continuous, High-computing-load Functions.
Technical report, IRIDIA, Université Libre de Bruxelles, Belgium, 2004.
Available at http://iridia.ulb.ac.be/ $\sim$ fvandenb/work/thesis/.
41: J. F. Wanga, J. Periaux, and Sefriouib M.
Parallel evolutionary algorithms for optimization problems in aerospace engineering.
Journal of Computational and Applied Mathematics, 149, issue 1:155-169, December 2002.
42: D. Winfield.
Function and functional optimization by interpolation in data tables.
PhD thesis, Harvard University, Cambridge, USA, 1969.
43: D. Winfield.
Function minimization by interpolation in a data table.
Journal of the Institute of Mathematics and its Applications, 12:339-347, 1973.

Frank Vanden Berghen 2004-08-16