Next: The final choice of Up: A short review of Previous: Non-Linear constraints   Contents

Subsections

Primal-dual interior point

The material of this section is based on the following references: [NW99,BV04,CGT00f].

Duality

Consider an optimization problem in this form:

 Subject to: (8.9)

We assume its domain is . We define the Lagrangian     associated with the problem 8.9 (see section 13.3 about the Lagrangian function):

 (8.10)

We refer to as the Lagrange multiplier or dual variables associated with the inequality constraint . We define the Lagrange dual function (or just dual function) as the minimum value of the Lagrangian over :

 (8.11)

Lower Bounds on optimal value

The dual function yields lower bounds on the optimal value of the problem 8.9: for any , we have

 (8.12)

The proof follow. Suppose is a feasible point of the problem 8.9. Then we have:

 (8.13)

since each term is non-negative. And therefore:

 (8.14)

Hence

 (8.15)

Since holds for every feasible point , the inequality 8.12 follows.
When the feasible domain is convex and when the objective function is convex, the problem is said to be convex. In this case, we have . The proof will be skipped. When the problem is not convex, we will define the Duality gap .

The Lagrange dual problem of a linear problem in inequality form

Lets calculate the Lagrange dual of an inequality form LP:

 Subject to: (8.16)

The Lagrangian is

 (8.17)

So the dual function is

 (8.18)

The infinitum of a linear function is , except in the special case when its identically zero, so the dual function is:

 (8.19)

The dual variable is dual feasible if and . The Lagrange dual of the LP 8.16 is to maximize over all . We can reformulate this by explicitly including the dual feasibility conditions as constraints, as in:

 Subject to: (8.20)

which is an LP in standard form.
Since the feasible domain is convex and the objective function is also convex, we have a convex problem. The solution of this dual problem is thus equal to the solution of the primal problem.

A primal-dual Algorithm

Consider an optimization problem in this form:

 Subject to:

where . We will now compute the dual problem:

 Since the problem is convex, we have ( are the Lagrange multipliers associated to the constraints and are the Lagrange multipliers associated to the constraints ): Subject to: Subject to:

The associated KKT conditions are:
 (8.21) (8.22) (8.23) the complementarity condition for the constraint 0 (8.24)

Primal dual methods find solutions of this system by applying variants of Newton's method (see section 13.6 for Newton's method for non-linear equations) to the three equalities 8.21-8.23 and modifying the search direction and step length so that inequalities are satisfied strictly at every iteration. The nonnegativity condition is the source of all the complications in the design and analysis of interior-point methods. Let's rewrite equations 8.21-8.24 in a slightly different form:
 0 (8.25) 0 (8.26)

where and . Primal dual methods generate iterates that satisfy the bounds 8.26 strictly. This property is the origin of the term interior-point.
As mentioned above, the search direction procedure has its origin in Newton's method for the set of nonlinear equations 8.25 (see section 13.6 for Newton's method for non-linear equations). We obtain the search direction by solving the following system of linear equations:

 (8.27)

where is the Jacobian of . If the current point is strictly feasible, we have:

 (8.28)

A full step along this direction is not permissible, since it would violate the bound . To avoid this difficulty, we perform a line search along the Newton direction so that the new iterate is

 (8.29)

for some line search parameter . Unfortunately, we often can only take a small step along this direction (= the affine scaling direction). To overcome this difficult, primal-dual methods modify the basic Newton procedure in two important ways:
1. They bias the search direction toward the interior of the nonnegative orthant , so that we can move further along this direction before one components of becomes negative.
2. They keep the components of from moving "too close" to the boundary of the nonnegative orthant.
We will consider these two modifications in turn in the next subsections.

Central path

The central path is an arc of strictly feasible points that is parameterized by a scalar . We can define the central path as

 (8.30)

Where each point solves the following system, which is a perturbation of the original system 8.25
 0 (8.31) 0 (8.32)

A plot of for a typical problem, projected into the space of primal variables , is shown in figure 8.5.

The equation 8.31 approximate 8.25 more and more closely as goes to zero. Primal-Dual algorithms take Newton's steps toward points on for which . Since these steps are biased toward the interior of the nonnegative orthant , it is usually possible to take longer steps along them than along pure Newton's steps for , before violating the positivity condition. To describe the biased search direction, we introduce a duality measure defined by

 (8.33)

which measure the average value of the pairwise products . We also define a centering parameter . Applying Newton's method to the system 8.31, we obtain:

 (8.34)

If , the equations 8.34 define a centering direction, a Newton step toward the point . Centering direction are usually biased strongly toward the interior of the nonnegative orthant and make little progress in reducing the duality measure . However, by moving closer to , they set the scene for substantial progress on the next iterations. At the other extreme, the value gives the standard Newton's step.
The choice of centering parameter and step length are crucial to the performance of the method. Techniques for controlling these parameters, directly and indirectly, give rise to a wide variety of methods with varying theoretical properties.

Link between Barrier method and Interior point method

In this section we want to find the solution of:

 Subject to: (8.35)

There exist a value of the dual variable and a value of the primal variable which satisfy the KKT conditions:
 0 (8.36) 0 (8.37) 0 (8.38) 0 (8.39)

Equations 8.36-8.39 are comparable to equations 8.21-8.24. There are the base equations for the primal-dual iterations. In the previous section, we motivated the following perturbation of these equations:
 0 (8.40) 0 (8.41) (8.42) 0 (8.43)

Let's now consider the following equivalent optimization problem (barrier problem):

 and (8.44)

For a given , at the optimum of equation 8.44, we have:

 (8.45)

If we define

 (8.46)

We will now proof that is dual feasible. First it's clear that because because is inside the feasible domain. We now have to proof that . Let's compute the value of the dual function of 8.35 at :

 Using equation 8.46: (8.47)

When , we will have which concludes the proof. is the duality gap.
Let's define the minimum value of the original problem 8.35, and the solution of for a given value of . From equation 8.47, we have the interesting following result: . That is: is no more than suboptimal.
What's the link between equations 8.40-8.43, which are used inside the primal-dual algorithm and the barrier method? We see that if we combine equations 8.45 and 8.46, we obtain 8.40. The equations 8.42 and 8.46 are the same, except that . The "perturbation parameter" in primal-dual algorithm is simply the barrier parameter in barrier algorithms. In fact, barrier method and primal-dual methods are very close. The main difference is: In primal-dual methods, we update the primal variables AND the dual variable at each iteration. In barrier methods, we only update the primal variables.

A final note on primal-dual algorithms.

The primal-dual algorithms are usually following this scheme:
1. Set
2. Solve approximatively the barrier problem 8.44 for a given value of the centering parameter (see equation 8.34 about ) using as starting point the solution of the previous iteration.
3. Update the barrier parameter . For example:
Increase the iteration counter
If not finished, go to step 2.
The principal difficulties in primal-dual methods are:
• The choice of centering parameter which is crucial to the performance of the method. If the decrease is too slow, we will loose our time evaluating the objective function far from the real optimum. For "approach 2", where these evaluations are very expensive (in time), it's a major drawback.
• In step 2 above, we have to solve approximatively an optimization problem. What are the tolerances? What's the meaning of approximatively?
• The feasible space should be convex. Extension to any feasible space is possible but not straight forward.
• The starting point should be feasible. Extension to any starting point is possible but not straight forward.
• The "switching mechanism" from the unconstrained steps when no constrained are violated to the constrained case is difficult to implement (if we want to keep the actual mechanism for step computation when no constraints are active).
These kinds of algorithm are useful when the number of constraints are huge. In this case, identifying the set of active constraints can be very time consuming (it's a combinatorial problem which can really explode). Barrier methods and primal-dual methods completely avoids this problem. In fact, most recent algorithms for linear programming and for quadratic programming are based on primal-dual methods for this reason. The main field of application of primal-dual methods is thus optimization in very large dimensions (can be more than 10000) and with many constraints (can be more than 6000000). Since we have only a reduced amount of constraints, we can use an active set method without any problem.

SQP Methods

The material of this section is based on the following references: [NW99,Fle87].
SQP stands for Sequential quadratic Programming''. We want to find the solution of:

 Subject to: (8.48)

As in the case of the Newton's method in unconstrained optimization, we will do a quadratical approximation of the function to optimize and search for the minimum of this quadratic. The function to be approximated will be the Lagrangian function .

 (8.49)

With : the Jacobian matrix of the constraints evaluated at : we have: or
and : the Hessian Matrix of the Langrangian function: =

The full Hessian of is thus :
 (8.50)

If we are on the boundary of the constraint, we will have , thus we can write:

 on the constraints boundaries: (8.51)

We want to find which minimize , subject to the constraints .
From equation 8.49 and using equation 8.51, we obtain:

 (8.52)

Using a first order approximation of the constraints around , we have the

 (8.53)

DEFINITION: a Quadratic Program (QP) is a function which finds the minimum of a quadratic subject to linear constraints.
Note that .
We can now define the SQP algorithm:
1. solve the QP subproblem described on equation 8.53 to determine and let be the vector of the Lagrange multiplier of the linear constraints obtained from the QP.
2. set
3. Increment . Stop if . Otherwise, go to step 1.
This algorithm is the generalization of the Newton's method'' for the constrained case. It has the same properties. It has, near the optimum (and under some special conditions), quadratical convergence.
A more robust implementation of the SQP algorithm adjust the length of the steps and performs "second order correction steps" (or "SOC steps").

A small note about the H matrix in the SQP algorithm.

The inclusion of the second order constraint terms in the subproblem is important in that otherwise second order convergence for nonlinear constraints would not be obtained. This is well illustrated by the following problem:

 (8.54)

in which the objective function is linear so that it is only the curvature of the constraint which causes a solution to exist. In this case the sequence followed by the SQP algorithm is only well-defined if the constraint curvature terms are included.
We can obtain a good approximation of this matrix using an extension of the BFGS update to the constrained case.

A final note about the SQP algorithm.

The following table compares the number of function evaluations and gradient evaluations required to solve Colville's (1969) [Col73] first three problems and gives some idea of the relative performances of the algorithms.
 Extrapolated Multiplier SQP method Problem barrier function penalty function (Powell, 1978a) [Pow77] TP1 177 47 6 TP2 245 172 17 TP3 123 73 3
The excellent results of the SQP method are very attractive.

Next: The final choice of Up: A short review of Previous: Non-Linear constraints   Contents
Frank Vanden Berghen 2004-04-19