Next: The final choice of Up: A short review of Previous: Non-Linear constraints Contents

Subsections

Duality
- Lower Bounds on optimal value
- The Lagrange dual problem of a linear problem in inequality form
A primal-dual Algorithm
Central path
Link between Barrier method and Interior point method
A final note on primal-dual algorithms.
SQP Methods
A small note about the H matrix in the SQP algorithm.
A final note about the SQP algorithm.

Primal-dual interior point

The material of this section is based on the following references: [NW99,BV04,CGT00f].

Duality

Consider an optimization problem in this form:

$\displaystyle \min f(x) \; \;$ Subject to: $\displaystyle \; \; c_i(x) \geq 0, \; i=1,\ldots,m$

(8.9)

We assume its domain is $\displaystyle {\cal D}= \left( \bigcap_{i=1}^m \text{{\bf dom}} c_i(x) \right) \bigcap \left( \text{{\bf dom}} f(x) \right)$ . We define the Lagrangian $\L : \Re^n \times$ $\mbox{$\cal R$}$ $^m \rightarrow \Re$ associated with the problem 8.9 (see section 13.3 about the Lagrangian function):

$\displaystyle \L (x,\lambda)=f(x)- \sum_{i=1}^m \lambda_i c_i(x)$

(8.10)

We refer to $\lambda_i$ as the Lagrange multiplier or dual variables associated with the $i^{th}$ inequality constraint $c_i(x) \geq 0$ . We define the Lagrange dual function (or just dual function) $g: \Re^m \rightarrow \Re$ as the minimum value of the Lagrangian over

:

$\displaystyle g(\lambda)= \inf_{x \in {\cal D}} \L (x,\lambda)$

(8.11)

Lower Bounds on optimal value

The dual function yields lower bounds on the optimal value

of the problem 8.9: for any $\lambda > 0$ , we have

$\displaystyle g(\lambda) \leq f(x^*)$

(8.12)

The proof follow. Suppose $\bar{x}$ is a feasible point of the problem 8.9. Then we have:

$\displaystyle \sum_{i=1}^m \lambda_i c_i(\bar{x}) \geq 0$

(8.13)

since each term is non-negative. And therefore:

$\displaystyle \L (\bar{x},\lambda)=f(\bar{x})-\sum_{i=1}^m \lambda_i c_i(\bar{x}) \leq f(\bar{x})$

(8.14)

Hence

$\displaystyle g(\lambda)=\inf_{x \in {\cal D}} \L (x,\lambda) \leq \L (\bar{x},\lambda) \leq f(\bar{x})$

(8.15)

Since $g(\lambda)\leq f(\bar{x})$ holds for every feasible point $\bar{x}$ , the inequality 8.12 follows.
When the feasible domain is convex and when the objective function is convex, the problem is said to be convex. In this case, we have $\max_{\lambda} g(\lambda) = \min_x f(x)$ . The proof will be skipped. When the problem is not convex, we will define the Duality gap $\displaystyle = \min_x f(x) - \max_{\lambda} g(\lambda) \geq 0$ .

The Lagrange dual problem of a linear problem in inequality form

Lets calculate the Lagrange dual of an inequality form LP:

$\displaystyle \min c^t x \; \;$ Subject to: $\displaystyle \; \; A x \leq b \Leftrightarrow b-Ax \geq 0$

(8.16)

The Lagrangian is

$\displaystyle \L (x,\lambda)=c^t x - \lambda^t (b - Ax)$

(8.17)

So the dual function is

$\displaystyle g(\lambda)= \inf_{x} \L (x,\lambda) = -b^t \lambda + \inf_x (A^t \lambda+c)^t x$

(8.18)

The infinitum of a linear function is $-\infty$ , except in the special case when its identically zero, so the dual function is:

$\displaystyle g(\lambda)= \begin{cases}-b^t \lambda \; & A^t \lambda +c =0 \\ -\infty & otherwise. \end{cases}$

(8.19)

The dual variable $\lambda$ is dual feasible if $\lambda > 0$ and $A^t \lambda + c = 0$ . The Lagrange dual of the LP 8.16 is to maximize

over all $\lambda > 0$ . We can reformulate this by explicitly including the dual feasibility conditions as constraints, as in:

$\displaystyle \max b^t \lambda \; \;$ Subject to: $\displaystyle \; \; \begin{cases}A^t \lambda +c =0 \\ \lambda \geq 0 \end{cases}$

(8.20)

which is an LP in standard form.
Since the feasible domain is convex and the objective function is also convex, we have a convex problem. The solution of this dual problem is thus equal to the solution of the primal problem.

A primal-dual Algorithm

Consider an optimization problem in this form:

$\displaystyle \min c^t x \; \;$ Subject to: $\displaystyle \; \; Ax=b, x \geq 0,$

where $c, x \in \Re^n, b \in \Re^m, A \in \Re^{m \times n}$ . We will now compute the dual problem:

$\displaystyle \L (x)=$	$\displaystyle c^t x-\lambda^t (Ax-b)-s^t x$
$\displaystyle g(\lambda,s)=$	$\displaystyle \inf_{x \in {\cal D}} \L (x,\lambda)$
Since the problem is convex, we have ( $\lambda \in \Re^m$ are the Lagrange multipliers associated to the constraints and $s \in \Re^n$ are the Lagrange multipliers associated to the constraints $x \geq 0$ ):
$\displaystyle =$	$\displaystyle \inf_{x \in {\cal D}} x^t (c-A^t \lambda-s) + b^t \lambda \min_x c^t \;$ Subject to: $\displaystyle \begin{array}{c} Ax=b \\ x \geq 0 \end{array} =$	$\displaystyle \max_{\lambda,s} g(\lambda,s) \nonumber$
$\displaystyle =$	$\displaystyle \max_{\lambda} b^t \lambda \; \;$ Subject to: $\displaystyle \begin{array}{c} A^t \lambda + s=c \\ s \geq 0, \end{array}$

The associated KKT conditions are:

$\displaystyle A^t \lambda +s$	$\displaystyle =$	$\displaystyle c,$	(8.21)
$\displaystyle Ax$	$\displaystyle =$	$\displaystyle b,$	(8.22)
$\displaystyle x_i s_i$	$\displaystyle =$	$\displaystyle 0, \; \; i=1,\ldots,n \;\;$	(8.23)
		$\displaystyle \hspace{1.5cm} ($ the complementarity condition for the constraint $\displaystyle x \geq 0)$
$\displaystyle (x,s)$	$\displaystyle \geq$	0	(8.24)

Primal dual methods find solutions $(x^*,\lambda^*,s^*)$ of this system by applying variants of Newton's method (see section 13.6 for Newton's method for non-linear equations) to the three equalities 8.21-8.23 and modifying the search direction and step length so that inequalities $(x,s) \geq 0$ are satisfied strictly at every iteration. The nonnegativity condition is the source of all the complications in the design and analysis of interior-point methods. Let's rewrite equations 8.21-8.24 in a slightly different form:

$\begin{displaymath}F(x,\lambda,s)= \left[ \begin{array}{c} A^t \lambda +s -c \\ Ax-b \\ X S e \end{array}\right]\end{displaymath}$	$\displaystyle =$	0	(8.25)
$\displaystyle (x,s)$	$\displaystyle \geq$	0	(8.26)

where $X=diag(x_1,\ldots,x_n), S=diag(s_1,\ldots,s_n)$ and $e=(1,\ldots,1)^t$ . Primal dual methods generate iterates $(x^k, \lambda^k, s^k)$ that satisfy the bounds 8.26 strictly. This property is the origin of the term interior-point.
As mentioned above, the search direction procedure has its origin in Newton's method for the set of nonlinear equations 8.25 (see section 13.6 for Newton's method for non-linear equations). We obtain the search direction $(\Delta x,\Delta \lambda, \Delta s)$ by solving the following system of linear equations:

$\displaystyle J(x,\lambda, s) \left[ \begin{array}{l} \Delta x \\ \Delta \lambda \\ \Delta s \end{array} \right]= -F(x,\lambda,s)$

(8.27)

where

is the Jacobian of

. If the current point is strictly feasible, we have:

$\displaystyle \left[ \begin{array}{ccc} 0 & A^t & I \\ A & 0 & 0 \\ S & 0 & ... ...rray} \right]=\left[ \begin{array}{c} 0 \\ 0 \\ -X S e \end{array} \right]$

(8.28)

A full step along this direction is not permissible, since it would violate the bound

. To avoid this difficulty, we perform a line search along the Newton direction so that the new iterate is

$\displaystyle (x,\lambda,s)+\alpha (\Delta x , \Delta \lambda , \Delta s)$

(8.29)

for some line search parameter $\alpha \in (0,1]$ . Unfortunately, we often can only take a small step along this direction (= the affine scaling direction). To overcome this difficult, primal-dual methods modify the basic Newton procedure in two important ways:

They bias the search direction toward the interior of the nonnegative orthant $(x,s) \geq 0$ , so that we can move further along this direction before one components of becomes negative.
They keep the components of from moving "too close" to the boundary of the nonnegative orthant.

We will consider these two modifications in turn in the next subsections.

Central path

The central path $\cal C$ is an arc of strictly feasible points that is parameterized by a scalar $\tau >0$ . We can define the central path as

$\displaystyle {\cal C}= \{ (x_\tau,\lambda_\tau,s_\tau) \vert \tau >0 \}$

(8.30)

Where each point $(x_\tau,\lambda_\tau,s_\tau) \in {\cal C}$ solves the following system, which is a perturbation of the original system 8.25

$\begin{displaymath}F(x,\lambda,s)= \left[ \begin{array}{c} A^t \lambda +s -c \\ Ax-b \\ X S e - \tau \end{array}\right]\end{displaymath}$	$\displaystyle =$	0	(8.31)
$\displaystyle (x,s)$	$\displaystyle \geq$	0	(8.32)

A plot of $\cal C$ for a typical problem, projected into the space of primal variables

, is shown in figure 8.5.

**Figure 8.5:** Central Path
$\begin{figure} \centering\epsfig{figure=figures/centralpath.eps, width=9cm, height=5cm} \end{figure}$

The equation 8.31 approximate 8.25 more and more closely as $\tau$ goes to zero. Primal-Dual algorithms take Newton's steps toward points on ${\cal C}$ for which $\tau >0$ . Since these steps are biased toward the interior of the nonnegative orthant $(x,s) \geq 0$ , it is usually possible to take longer steps along them than along pure Newton's steps for

, before violating the positivity condition. To describe the biased search direction, we introduce a duality measure $\mu$ defined by

$\displaystyle \mu=\frac{1}{n}\sum_{i=1}^n x_i s_i= \frac{x^t s}{n}$

(8.33)

which measure the average value of the pairwise products

. We also define a centering parameter $\displaystyle \sigma \in [0,1] = \frac{\tau}{\mu}$ . Applying Newton's method to the system 8.31, we obtain:

$\displaystyle \left[ \begin{array}{ccc} 0 & A^t & I \\ A & 0 & 0 \\ S & 0 & ... ...left[ \begin{array}{c} 0 \\ 0 \\ -X S e + \sigma \mu e \end{array} \right]$

(8.34)

If $\sigma=1$ , the equations 8.34 define a centering direction, a Newton step toward the point $(x_\mu,\lambda_\mu,s_\mu) \in {\cal C}$ . Centering direction are usually biased strongly toward the interior of the nonnegative orthant and make little progress in reducing the duality measure $\mu$ . However, by moving closer to $\cal C$ , they set the scene for substantial progress on the next iterations. At the other extreme, the value $\sigma=0$ gives the standard Newton's step.
The choice of centering parameter $\sigma_k$ and step length $\alpha_k$ are crucial to the performance of the method. Techniques for controlling these parameters, directly and indirectly, give rise to a wide variety of methods with varying theoretical properties.

Link between Barrier method and Interior point method

In this section we want to find the solution of:

$\displaystyle f(x^*)= \min_x f(x) \; \;$ Subject to: $\displaystyle c_i(x) \geq 0 , \; i=1,\ldots,m$

(8.35)

There exist a value $\lambda^* \in \Re^m$ of the dual variable and a value $x^* \in \Re^n$ of the primal variable which satisfy the KKT conditions:

$\displaystyle \nabla f(x^) - \sum_{i=1}^m \lambda_i^ \nabla c_i(x^*)$	$\displaystyle =$	0	(8.36)
$\displaystyle c_i(x^*)$	$\displaystyle \geq$	0	(8.37)
$\displaystyle \lambda_i^* c_i(x^*)$	$\displaystyle =$	0	(8.38)
$\displaystyle \lambda^*$	$\displaystyle \geq$	0	(8.39)

Equations 8.36-8.39 are comparable to equations 8.21-8.24. There are the base equations for the primal-dual iterations. In the previous section, we motivated the following perturbation of these equations:

$\displaystyle \nabla f(x^) + \sum_{i=1}^m \lambda_i^ \nabla c_i(x^*)$	$\displaystyle =$	0	(8.40)
$\displaystyle c_i(x)$	$\displaystyle \geq$	0	(8.41)
$\displaystyle \lambda_i^* c_i(x^*)$	$\displaystyle =$	$\displaystyle \tau$	(8.42)
$\displaystyle \lambda^*$	$\displaystyle \geq$	0	(8.43)

Let's now consider the following equivalent optimization problem (barrier problem):

$\displaystyle \varphi_t(x) := f(x) - t \sum_{i=1}^m ln (c_i(x)) \;$ and $\displaystyle t\rightarrow 0$

(8.44)

For a given

, at the optimum

of equation 8.44, we have:

$\displaystyle 0=\nabla \varphi_t(x^*)= \nabla f(x^*)- \sum_{i=1}^m \frac{t }{c_i(x^*)} \nabla c_i(x^*)$

(8.45)

If we define

$\displaystyle \lambda_i^*=\frac{t}{c_i(x^*)}$

(8.46)

We will now proof that $\lambda^*$ is dual feasible. First it's clear that $\lambda^* \geq 0$ because $c_i(x^*) \geq 0$ because

is inside the feasible domain. We now have to proof that $g(\lambda^*)=f(x^*)$ . Let's compute the value of the dual function of 8.35 at $(\lambda^*)$ :

$\displaystyle \L (x,\lambda) =$	$\displaystyle f(x)- \sum_{i=1}^m \lambda_i c_i(x)$
$\displaystyle g(\lambda^*) =$	$\displaystyle f(x^)- \sum_{i=1}^m \lambda_i^ c_i(x^*)$
Using equation 8.46:
$\displaystyle =$	$\displaystyle f(x^*)- m t$	(8.47)

When $t \rightarrow 0$ , we will have $g(\lambda^*)=f(x^*)=p^*$ which concludes the proof.

is the duality gap.
Let's define

the minimum value of the original problem 8.35, and $(x^*,\lambda^*)$ the solution of $min_x \varphi_t(x)$ for a given value of

. From equation 8.47, we have the interesting following result: $f(x^*)-p^* \leq m t$ . That is:

is no more than

suboptimal.
What's the link between equations 8.40-8.43, which are used inside the primal-dual algorithm and the barrier method? We see that if we combine equations 8.45 and 8.46, we obtain 8.40. The equations 8.42 and 8.46 are the same, except that $t=\tau$ . The "perturbation parameter" $\tau$ in primal-dual algorithm is simply the barrier parameter

in barrier algorithms. In fact, barrier method and primal-dual methods are very close. The main difference is: In primal-dual methods, we update the primal variables

AND the dual variable $\lambda$ at each iteration. In barrier methods, we only update the primal variables.

A final note on primal-dual algorithms.

The primal-dual algorithms are usually following this scheme:

Set
Solve approximatively the barrier problem 8.44 for a given value of the centering parameter $\sigma_l$ (see equation 8.34 about $\sigma$ ) using as starting point the solution of the previous iteration.
Update the barrier parameter $\sigma$ . For example: $\sigma_{l+1} := min \{\; 0.2 \; \sigma_l, \;\sigma_l^{1.5} \; \}$
Increase the iteration counter
If not finished, go to step 2.

The principal difficulties in primal-dual methods are:

The choice of centering parameter $\sigma$ which is crucial to the performance of the method. If the decrease is too slow, we will loose our time evaluating the objective function far from the real optimum. For "approach 2", where these evaluations are very expensive (in time), it's a major drawback.
In step 2 above, we have to solve approximatively an optimization problem. What are the tolerances? What's the meaning of approximatively?
The feasible space should be convex. Extension to any feasible space is possible but not straight forward.
The starting point should be feasible. Extension to any starting point is possible but not straight forward.
The "switching mechanism" from the unconstrained steps when no constrained are violated to the constrained case is difficult to implement (if we want to keep the actual mechanism for step computation when no constraints are active).

These kinds of algorithm are useful when the number of constraints are huge. In this case, identifying the set of active constraints can be very time consuming (it's a combinatorial problem which can really explode). Barrier methods and primal-dual methods completely avoids this problem. In fact, most recent algorithms for linear programming and for quadratic programming are based on primal-dual methods for this reason. The main field of application of primal-dual methods is thus optimization in very large dimensions (can be more than 10000) and with many constraints (can be more than 6000000). Since we have only a reduced amount of constraints, we can use an active set method without any problem.

SQP Methods

The material of this section is based on the following references: [NW99,Fle87].
SQP stands for ``Sequential quadratic Programming''. We want to find the solution of:

$\displaystyle f(x^*)= \min_x f(x) \; \;$ Subject to: $\displaystyle c_i(x) \geq 0 , \; i=1,\ldots,m$

(8.48)

As in the case of the Newton's method in unconstrained optimization, we will do a quadratical approximation of the function to optimize and search for the minimum of this quadratic. The function to be approximated will be the Lagrangian function $\L$ .
The quadratical approximation of $\L$ is:

$\displaystyle \L (x_k+\delta_x,\lambda_k+\delta_\lambda) \approx$ $\displaystyle \mbox{$\cal Q$}$ $\displaystyle (\delta_x,\delta_\lambda)$	$\displaystyle =$	$\displaystyle \L (x_k,\lambda_k)+ \begin{picture}(.27,.3) \put(0,0){\makebox(0,... ...bda_k) \left( \begin{array}{c}\delta_x \\ \delta_\lambda \end{array} \right) +$
		$\begin{displaymath}\frac{1}{2} \left( \begin{array}{cc}\delta_x & \delta_\lambda... ... \begin{array}{c}\delta_x \\ \delta_\lambda \end{array} \right)\end{displaymath}$

$\displaystyle \Longleftrightarrow$ $\displaystyle \mbox{$\cal Q$}$ $\displaystyle (\delta_x,\delta_\lambda)$	$\displaystyle =$	$\begin{displaymath}\L (x_k,\lambda_k)+ \left( \begin{array}{cc} g_k+A_k \lambda_... ...\begin{array}{c} \delta_x \\ \delta_\lambda \end{array}\right)+\end{displaymath}$
		$\displaystyle \frac{1}{2} \left( \begin{array}{cc} \delta_x & \delta_\lambda \e... ...\right] \left( \begin{array}{c} \delta_x \\ \delta_\lambda \end{array} \right)$	(8.49)

With

: the Jacobian matrix of the constraints evaluated at

: we have: $\displaystyle (A_k)_{i,j}= \frac{\partial c_i(x)}{\partial j} \; \;$ or $\; \; \displaystyle (A_k)_{i}= \nabla c_i(x)$
and

: the Hessian Matrix of the Langrangian function: $\nabla_x^2 \L (x_k, \lambda_k)$ = $\displaystyle H_k= \nabla_x^2 f(x_k)- \sum_i (\lambda_{k})_i \nabla^2 c_i(x_k)$

The full Hessian

of $\L$ is thus :

$\displaystyle W_k=\left[ \begin{array}{cc} H_k & -A_k \\ -A_k & 0 \end{array}... ...\nabla$}} \put(.16,.17){\circle*{.18}} \end{picture} ^2 \L (x_k,\lambda_k)] )$

(8.50)

If we are on the boundary of the $i^{th}$ constraint, we will have $(\delta_x)^t \nabla c_i(x) =0$ , thus we can write:

on the constraints boundaries: $\displaystyle A_k \delta_x \approx 0$

(8.51)

We want to find $\delta_x$ which minimize $\mbox{$\cal Q$}$ $(\delta_x,\delta_\lambda)$ , subject to the constraints $c_i(x) \geq 0, \: i=1,...,m$ .
From equation 8.49 and using equation 8.51, we obtain:

$\begin{displaymath}\begin{array}{c} \displaystyle \min_{\delta_x} \mbox{$\cal Q$... ...subject to } c_j(x_k+\delta_x) \geq 0, \: j=1,...,r \end{array}\end{displaymath}$

(8.52)

Using a first order approximation of the constraints around

, we have the

$\begin{displaymath}\begin{array}{l} \text{\em Quadratic Program QP}:\\ \disp... ...elta_x)^t \nabla c_j(x_k) \geq 0, \; \; j=1, ...,r \end{array}\end{displaymath}$

(8.53)

DEFINITION: a Quadratic Program (QP) is a function which finds the minimum of a quadratic subject to linear constraints.
Note that $H_k= \nabla_x^2 \L (x_k,\lambda_k) \neq \nabla^2 f(x_k)$ .
We can now define the SQP algorithm:

solve the QP subproblem described on equation 8.53 to determine $\delta_x$ and let $\lambda_{k+1}$ be the vector of the Lagrange multiplier of the linear constraints obtained from the QP.
set $x_{k+1}=x_k+\delta_x$
Increment . Stop if $\begin{picture}(.27,.3) \put(0,0){\makebox(0,0)[bl]{$\nabla$}} \put(.16,.17){\circle*{.18}} \end{picture} \L (x_k,\lambda_k)\approx 0$ . Otherwise, go to step 1.

This algorithm is the generalization of the ``Newton's method'' for the constrained case. It has the same properties. It has, near the optimum (and under some special conditions), quadratical convergence.
A more robust implementation of the SQP algorithm adjust the length of the steps and performs "second order correction steps" (or "SOC steps").

A small note about the H matrix in the SQP algorithm.

As already mentioned, we have $H_k= \nabla_x^2 \L (x_k,\lambda_k) = \nabla_x^2 f(x_k)- \sum_i (\lambda_{k})_i \nabla^2 c_i(x_k) \neq \nabla^2 f(x_k)$
The inclusion of the second order constraint terms in the subproblem is important in that otherwise second order convergence for nonlinear constraints would not be obtained. This is well illustrated by the following problem:

$\begin{displaymath}\begin{array}{l} \text{minimize} - x_1 - x_2 \\ \text{subject to } 1 - x_1^2-x_2^2 \geq 0 \end{array}\end{displaymath}$

(8.54)

in which the objective function is linear so that it is only the curvature of the constraint which causes a solution to exist. In this case the sequence followed by the SQP algorithm is only well-defined if the constraint curvature terms are included.
We can obtain a good approximation of this matrix using an extension of the BFGS update to the constrained case.

A final note about the SQP algorithm.

The following table compares the number of function evaluations and gradient evaluations required to solve Colville's (1969) [Col73] first three problems and gives some idea of the relative performances of the algorithms.

	Extrapolated	Multiplier	SQP method
Problem	barrier function	penalty function	(Powell, 1978a) [Pow77]
TP1	177	47	6
TP2	245	172	17
TP3	123	73	3

The excellent results of the SQP method are very attractive.

Next: The final choice of Up: A short review of Previous: Non-Linear constraints Contents

Frank Vanden Berghen 2004-04-19