KEMBAR78
Op Tim Ization | PDF | Mathematical Optimization | Mathematics Of Computing
0% found this document useful (0 votes)
12 views25 pages

Op Tim Ization

The document discusses multivariable unconstrained optimization, detailing the use of gradients and Hessians to find local minima and maxima of functions with multiple variables. It explains methods such as Newton's Method and the Steepest Descent Method for solving optimization problems, including the importance of step size and convergence criteria. Additionally, it covers the concepts of positive/negative definiteness of matrices and provides examples to illustrate the optimization process.

Uploaded by

techiesaditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views25 pages

Op Tim Ization

The document discusses multivariable unconstrained optimization, detailing the use of gradients and Hessians to find local minima and maxima of functions with multiple variables. It explains methods such as Newton's Method and the Steepest Descent Method for solving optimization problems, including the importance of step size and convergence criteria. Additionally, it covers the concepts of positive/negative definiteness of matrices and provides examples to illustrate the optimization process.

Uploaded by

techiesaditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

26-08-2025

Multivariable Unconstrained
Optimization
• For functions with one variable, we use the 1st
and 2nd derivatives.
Mathematical Methods of
Optimization • For functions with multiple variables, we use
identical information that is the gradient and the
Hessian.

• The gradient is the first derivative with respect to


all variables whereas the Hessian is the
equivalent of the second derivative

The Gradient The Hessian


• Review of the gradient (): • The Hessian (2) of f(x1, x2, …, xn) is:
For a function “f ”, of variables x1, x2, …, xn:  f 2 f 2 f 2 
  
 x12
2
x1x2 x1xn 
 f f f 
f =     f f 2 f 2 
 x1 x2 xn  
 2 f =  x x x2
2
x2 xn 
 2 1 
 2    
 f f 2 f 2
Example: f = 15x1 + 2( x2 )3 − 3x1 ( x3 ) 2 
 x x xn x2 xn 
2
 n 1

f = 15 − 3( x3 ) 2 6( x2 ) 2 − 6 x1 x3 

1
26-08-2025

Hessian Example Unconstrained Optimization


• Example : The optimization procedure for multivariable
functions is:
f = 15x1 + 2( x2 )3 − 3x1 ( x3 ) 2
1. Solve for the gradient of the function equal to

f = 15 − 3( x3 ) 2
6( x2 ) 2
− 6 x1 x3  zero to obtain candidate points.
2. Obtain the Hessian of the function and
evaluate it at each of the candidate points
 0 0 − 6 x3  • If the result is “positive definite” then the point is a
2 
 f = 0 12 x2 0  local minimum.
• If the result is “negative definite” then the point is a
− 6 x3 0 − 6 x1  local maximum.

Positive/Negative Definite Positive/Negative Semi-definite


• A matrix is “positive definite” if all of the • A matrix is “positive semi-definite” if all of
eigenvalues of the matrix are positive the eigenvalues are non-negative (≥ 0)
(> 0)
• A matrix is “negative semi-definite” if all of
• A matrix is “negative definite” if all of the the eigenvalues are non-positive (≤ 0)
eigenvalues of the matrix are negative
(< 0)

2
26-08-2025

Example Matrix Unconstrained NLP Example


Given the matrix A: 2 4 5 Consider the problem:
A = − 5 − 7 − 1 Minimize f(x1, x2, x3) = (x1)2 + x1(1 – x2) + (x2)2
 1 1 2  – x2x3 + (x3)2 + x3
The eigenvalues of A are:
1 = −3.702  2 = −2 3 = 2.702 First, we find the gradient with respect to xi:
 2 x1 + 1 − x2 
f =  − x1 + 2 x2 − x3 
This matrix is negative definite
 − x2 + 2 x3 + 1 

Unconstrained NLP Example Unconstrained NLP Example


Next, we set the gradient equal to zero: So we have only one candidate point to
 2 x1 + 1 − x2  0 check.
f = 0   − x + 2 − x  = 0 
 1 2 3  
− x2 + 2 x3 + 1 0 Find the Hessian:
So, we have a system of 3 equations and 3  2 −1 0 
 f = − 1 2 − 1
unknowns. When we solve, we get: 2
 x1  − 1
x =  x2  = − 1  0 − 1 2 
 x3  − 1

3
26-08-2025

Unconstrained NLP Example Unconstrained NLP Example


The eigenvalues of this matrix are: Unlike in Linear Programming, unless we
know the shape of the function being
1 = 3.414 2 = 0.586 3 = 2
minimized or can determine whether it is
convex, we cannot tell whether this point is
All of the eigenvalues are > 0, so the
the global minimum or if there are function
Hessian is positive definite.
values smaller than it.
 x1  − 1
So, the point x =  x2  = − 1 is a minimum
 x3  − 1

Method of Solution Method of Solution


• In the previous example, when we set the • To avoid this difficulty, NLP problems are
gradient equal to zero, we had a system of usually solved numerically.
3 linear equations & 3 unknowns.
• For other problems, these equations could • We will now look at examples of numerical
be nonlinear. methods used to find the optimum point for
• Thus, the problem can become trying to single-variable NLP problems. These and
solve a system of nonlinear equations, other methods may be found in any
which can be very difficult. numerical methods reference.

4
26-08-2025

Newton’s Method Newton’s Method Diagram


When solving the equation f (x) = 0 to find a
minimum or maximum, one can use the
Tangent of
iteration step: f (x) at xk
f ' (xk )
x k +1 = x k − '' k f (x) x* xk+1 xk x
f (x )
where k is the current iteration. Newton’s Method approximates f (x) as a straight
Iteration is continued until |xk+1 – xk| < e where line at xk and obtains a new point (xk+1), which is
e is some specified tolerance. used to approximate the function at the next
iteration. This is carried on until the new point is
sufficiently close to x*.

Newton’s Method Comments Regula-Falsi Method


• One must ensure that f (xk+1) < f (xk) for This method requires two points, xa & xb that
finding a minimum and f (xk+1) > f (xk) for bracket the solution to the equation
finding a maximum. f (x) = 0.
• Disadvantages: f ' ( xb )  ( xb − x a )
xc = xb − ' b
– Both the first and second derivatives must be f (x ) − f ' (xa )
calculated
– The initial guess is very important – if it is not where xc will be between xa & xb. The next
close enough to the solution, the method may interval will be xc and either xa or xb,
not converge whichever has the sign opposite of xc.

5
26-08-2025

Regula-Falsi Diagram Regula-Falsi Comments

f (x)
• This method requires initial knowledge of
two points bounding the solution
• However, it does not require the
xa xc
x* x calculation of the second derivative
xb
• The Regula-Falsi Method requires slightly
The Regula-Falsi method approximates the more iterations to converge than the
function f (x) as a straight line and Newton’s Method
interpolates to find the root.

Slides: Boyd and Vandenberghe, Convex optimization

6
26-08-2025

7
26-08-2025

Multivariable Optimization The Step Size


• Now we will consider unconstrained • The step size, ak, is calculated in the
multivariable optimization following way:
• Nearly all multivariable optimization • We want to minimize the function f(xk+1) =
methods do the following: f(xk +akdk) where the only variable is ak
1. Choose a search direction dk because xk & dk are known.
• We set df (x + ak d ) = 0 and solve for ak
k k k
2. Minimize along that direction to find a new
point: da
xk +1
= x +a d
k k k
using a single-variable solution method
where k is the current iteration number and ak such as the ones shown previously.
is a positive scalar called the step size.

8
26-08-2025

Steepest Descent Method Steepest Descent Method


• This method is very simple – it uses the • Because the gradient is the rate of change
gradient (for maximization) or the negative of the function at that point, using the
gradient (for minimization) as the search gradient (or negative gradient) as the
direction: search direction helps reduce the number
+  max  of iterations needed
d k =  f (x k ) for  
−   min  x2 f(x) = 5
-f(xk)
f(x) = 20
f(xk)

+ 
So, x k +1 = x k  a k f (x k ) xk
−  f(x) = 25
x1

Steepest Descent Method Steps Steepest Descent Method Steps


So the steps of the Steepest Descent 5. To determine convergence, either use
Method are: some given tolerance e1 and evaluate:
1. Choose an initial point x0 f (x k +1 ) − f (x k )  e 1
2. Calculate the gradient f(xk) where k is
the iteration number for convergence
3. Calculate the search vector: d = f (x )
k k Or, use another tolerance e2 and evaluate:
4. Calculate the next x: x k +1 = x k + a k d k f ( x k )  e 2
Use a single-variable optimization for convergence
method to determine ak.

9
26-08-2025

Convergence Steepest Descent Example


• These two criteria can be used for any of Let’s solve the earlier problem with the
the multivariable optimization methods Steepest Descent Method:
discussed here Minimize f(x1, x2, x3) = (x1)2 + x1(1 – x2) + (x2)2
– x2x3 + (x3)2 + x3

Recall: The norm of a vector, ||x|| is given by: 0 


x = xT  x = ( x1 ) 2 + ( x2 ) 2 +  + ( xn ) 2 Let’s pick x = 0
0
 
0

Steepest Descent Example Steepest Descent Example


f (x) = 2 x1 + (1 − x2 ) − x1 + 2 x2 − x3 − x2 + 2 x3 + 1 f (x1 ) = (a 0 ) 2 + (−a 0 )(1) + 0 − 0 + (a 0 ) 2 + (−a 0 )

d0 = −f (x0 ) = −2(0) + 1 − 0 − 0 + 0 − 0 − 0 + 0 + 1 = 2(a 0 ) 2 − 2(a 0 )


= −1 0 1 = − 1 0 − 1
df (x1 )
= 4(a 0 ) − 2
x1 = 0 0 0 + a 0  − 1 0 − 1 da 0

Now, set equal to zero and solve:


Now, we need to determine a0 4(a 0 ) = 2  a 0 = 2 = 1
4 2

10
26-08-2025

Steepest Descent Example Steepest Descent Example


So, Take the negative gradient to find the next
x = 0 0 0 + a  − 1 0 − 1
1 0 search direction:
 1 1
= 0 0 0 + − 0 −  
d1 = −f (x1 ) = − − 1 + 1 + 0
1
+0+
1 
0 − 1 + 1
 2 2
 2 2 
 1 1
 x1 = − 0 −  d1 = 0 − 1 0
 2 2

Steepest Descent Example Steepest Descent Example


Update the iteration formula: Insert into the original function & take the
derivative so that we can find a1:
 1 1
x 2 = − 0 −  + a 1  0 − 1 0
 2 2 f (x 2 ) =
1  1
( ) 1 1 1
+  −  1 + a 1 + (a 1 ) 2 − (a 1 )  + −
4  2 2 4 2
 1 1 = (a ) − a −
1 2 1 1
= − −a1 −  2
 2 2
df (x1 )
= 2(a 1 ) − 1
da 1

11
26-08-2025

Steepest Descent Example Steepest Descent Example


Now we can set the derivative equal to zero Now, calculate x2:
and solve for a1:
 1 1
x 2 = − 0 −  + a 1  0 − 1 0
2(a 1 ) = 1  a 1 = 1  2 2
2
 1 1  1 
= − 0 −  + 0 − 0
 2 2  2 

 1 1 1
 x 2 = − − − 
 2 2 2

Steepest Descent Example Steepest Descent Example


 1 1 1 1  Find a2: 1 3 1
d 2 = −f (x 2 ) = − − 1 + 1 + −1 + − 1 + 1 f (x3 ) = (a 2 + 1) 2 − (a 2 + 1) +
 2 2 2 2  2 2 4
 1 1
 d 2 = − 0 −  df (x3 ) 3
 2 2 = (a 2 + 1) −
So, da 2 2
 1 1 1  1 1
x 3 = − − −  + a 2  − 0 −  Set the derivative equal to zero and solve:
 2 2 2   2 2
 1 1 1  (a 2 + 1) =
3
= − (a 2 + 1) − − (a 2 + 1)  a = 12
2

 2 2 2  2

12
26-08-2025

Steepest Descent Example Steepest Descent Example


Calculate x3: Find the next search direction:

 1 1 1  1 1  1   1 
x 3 = − − −  + a 2  − 0 −  d 3 = −f (x3 ) = − 0 0 = 0 − 0
 2 2 2  2 2  2   2 
 1 1 1  1 1
= − − −  + − 0 −   3 1 3  1 
 2 2 2  4 4 x 4 = − − −  + a 3  0 − 0
 4 2 4  2 
 3 1 3  3 3
 x 3 = − − −  = −
1
− (a 3 + 1) − 
 4 2 4  4 2 4

Steepest Descent Example Steepest Descent Example


Find a3: 1 3
f (x 4 ) = (a 3 + 1) 2 − (a 3 ) −
3 So, x4 becomes:
4 2 2
 3 1 3  5 
x 4 = − − −  + 0 − 0
 4 2 4  8 
df (x 4 ) 1 3 9
= (a + 1) − = 0
da 3 2 8
 3 9 3
 x 4 = − − − 
5  4 8 4
a3 =
4

13
26-08-2025

Steepest Descent Example Steepest Descent Example


The next search direction: Find a4: 73 4 2 43 4 51
f (x5 ) = (a ) − a −
5 3 5  5 3 5 32 32 64
d 4 = −f (x 4 ) = −  − = − − 
8 4 8   8 4 8
df (x5 ) 73 4 43
= a − =0
da 4 16 32
 3 9 3  5 3 5
x 5 = − − −  + a 4  − − 
 4 8 4  8 4 8
 a = 43146
4

 1 5 3 3 1 5 
= − (3 + a 4 ) − ( − a 4 ) − (3 + a 4 )
 4 2 4 2 4 2 

Steepest Descent Example Steepest Descent Example


Update x5: Let’s check to see if the convergence criteria
is satisfied
 3 9 3  43  5 3 5 Evaluate ||f(x5)||:
x 5 = − − − +  − − 
 4 8 4  146  8 4 8  21 35 21 
f (x5 ) = 
 584 584 584 
 1091 66 1091 
 x 5 = − − −
 1168 73 1168 
f (x5 ) = (21584) + (35 584) + (21584)
2 2 2
= 0.0786

14
26-08-2025

Steepest Descent Example Quadratic Functions


So, ||f(x5)|| = 0.0786, which is very small • Quadratic functions are important for the
and we can take it to be close enough to next method we will look at
zero for our example • A quadratic function can be written in the
Notice that the answer of form: xTQx where x is the vector of
variables and Q is a matrix of coefficients
 1091 66 1091 
x = − − − 
 1168 73 1168
Example:  2 − 1 0   x1 
is very close to the value of x* = − 1 − 1 − 1 x1 x2 x3 − 1 2 − 1  x2  = 2(x1 ) 2 − 2x1x 2 + 2(x 2 ) 2
that we obtained analytically  0 − 1 2   x3 
– 2x 2 x 3 + 2(x3 ) 2

Conjugate Gradient Method Conjugate Gradient Steps


• The Conjugate Gradient Method has the 1. Choose a starting point x0 and calculate
property that if f(x) is quadratic, it will take f(x0). Let d0 = -f(x0)
exactly n iterations to converge, where n is
the number of variables in the x vector 2. Calculate x1 using: x = x + a d
1 0 0 0

• Although it works especially well with Find a0 by performing a single-variable


quadratic functions, this method will work optimization on f(x0 +a0d0) using the
with non-quadratic functions also methods discussed earlier. (See
illustration after algorithm explanation)
https://en.wikipedia.org/wiki/Conjugate_gradient_method

15
26-08-2025

Conjugate Gradient Steps Conjugate Gradient Steps


3. Calculate f(x1) and f(x1). The new 4. Use either of the two methods discussed
search direction is calculated using the earlier to determine tolerance:
equation:
f (x k +1 ) − f (x k )  e 1
T f (x1 )  f (x1 )
d = −f (x ) + d T
1 1 0

 f (x 0 )  f (x 0 ) Or,
This can be generalized for the kth iteration: f ( x k )  e 2
k +1 k +1
 f (x )  f (x )
T
d k +1 = −f (x k +1 ) + d k
T f (x k )  f (x k )

Number of Iterations Step Size for Quadratic Functions


• For quadratic functions, this method will • When optimizing the step size, we can
converge in n iterations (k = n) approximate the function to be optimized
in the following manner:
• For non-quadratic functions, after n f (x k + ad k )  f (x k ) + T f (x k )  ad k + 12 (ad k )T   2 f (x k )  (ad k )
iterations, the algorithm cycles again with
dn+1 becoming d0. • For a quadratic function, this is not an
approximation – it is exact

16
26-08-2025

Step Size for Quadratic Functions Step Size for Quadratic Functions
We take the derivative of that function with • So, for the problem of optimizing a
respect to a and set it equal to zero: quadratic function,
df (x k + ad k ) T f ( x k )  d k
= T f (x k )  d k + (d k )T   2 f (x k )  ad k = 0 a k ,opt = −
da (d )   2 f (x k )  d k
k T

The solution to this equation is: is the optimum step size.


T f ( x k )  d k
• For a non-quadratic function, this is an
a k ,opt = − approximation of the optimum step size.
(d )   2 f (x k )  d k
k T

Multivariable Newton’s Method Multivariable Newton’s Method


We can approximate the gradient of f at a We can generalize this equation to give an
point x0 by: iterative expression for the Newton’s
Method:
f (x)  f (x0 ) + 2 f (x0 )  (x − x0 )
We can set the right-hand side equal to zero

x k +1 = x k −  2 f (x k ) −1
 f (x k )
and rearrange to give: where k is the iteration number
0

x = x −  f (x ) 2 0

−1
 f (x )0

17
26-08-2025

Newton’s Method Steps Comments on Newton’s Method


1. Choose a starting point, x0 • We can see that unlike the previous two
2. Calculate f(xk) and 2f(xk) methods, Newton’s Method uses both the
3. Calculate the next x using the equation gradient and the Hessian
• This usually reduces the number of

x k +1 = x k −  2 f (x k ) 
−1
 f (x k ) iterations needed, but increases the
computation needed for each iteration
4. Use either of the convergence criteria • So, for very complex functions, a simpler
discussed earlier to determine method is usually faster
convergence. If it hasn’t converged,
return to step 2.

Newton’s Method Example Newton’s Method Example


For an example, we will use the same The Hessian is:  2 −1 0 
problem as before:  2 f (x) = − 1 2 − 1
 0 − 1 2 
Minimize f(x1, x2, x3) = (x1)2 + x1(1 – x2) + (x2)2
– x2x3 + (x3)2 + x3 And we will need the inverse of the Hessian:

 2 −1 0 
−1 3 1 1 
f (x) = 2 x1 − x2 + 1 − x1 + 2 x2 − x3 − x2 + 2 x3 + 1  4 2 4
 2
f ( x) 
−1
= − 1 2 − 1 =  1 1 1 
 0 − 1 2   12 1
2
3 
 4 2 4 

18
26-08-2025

Newton’s Method Example Newton’s Method Example


0  So, the new x is:
−1
x1 = x 0 − 2f ( x 0 )  f ( x 0 )
So, pick x = 0
0

3 1 1 
0 0   4 2 4   1
= 0  −  1 1 1   0 
 2 2
Calculate the gradient for the 1st iteration: 0   1 1 3   1
 4 2 4 
f (x0 ) = 0 − 0 + 1 − 0 + 0 − 0 − 0 + 0 + 1
− 1
 f (x ) = 1 0 1
0  x1 = − 1
− 1

Newton’s Method Example Comments on Example


Now calculate the new gradient: • Because it uses the 2nd derivative,
Newton’s Method models quadratic
f (x1 ) = − 2 + 1 + 1 1 − 2 + 1 1 − 2 + 1 = 0 0 0 functions exactly and can find the optimum
point in one iteration.
Since the gradient is zero, the method has
• If the function had been a higher order, the
converged
Hessian would not have been constant
and it would have been much more work
to calculate the Hessian and take the
inverse for each iteration.

19
26-08-2025

Constrained Nonlinear Convexity & global vs. local optima


➔When minimizing a function, if we want to be sure that we
Optimization can get a global solution via differentiation, we need to impose
• Previously in this chapter, we solved NLP some requirements on our objective function.
problems that only had objective functions, ➔We will also need to impose some requirements on the
feasible set S (set of possible values the solution x* may take).
with no constraints.
Min f(x) min f ( x)
• Now we will look at methods on how to
solve problems that include constraints. subject to subject to
h(x)=c
g(x)> b
Feasible set
xS
Definition: If f(x) is a convex function, and if S is a convex set, then
the above problem is a convex programming problem.
Definition: If f(x) is not a convex function, or if S is not a convex set,
then the above problem is a non-convex programming problem. 78

Convex vs. nonconvex programming problems A convex programming problem


MATHEMATICAL
We focus on this one, but
PROGRAMMING
The desirable quality of a convex Two variables with one min f ( x1 , x2 ) conclusions we derive will also
programming problem is that any locally equality-constraint s.t. h( x1 , x2 ) = c apply to the other two. The
benefit of focusing on this one is
optimal solution is also a globally optimal Convex that we can visualize it.
solution. ➔If we have a method of min f ( x)
We address convex Multi-variable with one
finding a locally optimal solution, that equality-constraint. s.t. h( x ) = c
programming
method also finds for us the globally problems in
optimum solution. addressing linear
programming. Multi-variable with multiple min f ( x)
The undesirable quality of a non-convex equality-constraints. s.t. h( x ) = c
programming problem is that any method
which finds a locally optimal solution Non-convex
does not necessarily find the globally We will also, later,
address a special form of
optimum solution. non-convex programming
problems called integer
programs. 79 80

20
26-08-2025

Contour maps/Constant Cost Contours/level Sets Contour maps and 3-D illustrations
. Definition: A contour map is a 2-dimensional plane, i.e., a .
coordinate system in 2 variables, say, x1, x2, that illustrates Example: Draw the 3-D surface for f ( x1, x2 ) = x12 + x22
curves (contours) of constant functional value f(x1, x2). [X,Y] = meshgrid(-
Example: Draw the contour map for f ( x1, x2 ) = x12 + x22 2.0:.2:2.0,-2.0:.2:2.0);
Z = X.^2+Y.^2;
[X,Y] = meshgrid(- surfc(X,Y,Z)
2.0:.2:2.0,-2.0:.2:2.0); xlabel('x1')
Z = X.^2+Y.^2; ylabel('x2')
[c,h]=contour(X,Y,Z); zlabel('f(x1,x2)')
clabel(c,h);
grid; Height is f(x)
xlabel('x1');
Each contour of fixed value f is
ylabel('x2');
the projection onto the x1-x2
plane of a horizontal slice
made of the 3-D figure at a Contours
value f above the x1-x2 plane.

81 82

Solving a convex program: graphical analysis Solving a convex program: graphical analysis
.
Example: Solve this convex program: min f ( x1 , x2 ) = x12 + x22 .
Solution:
s.t. h(x1,x2 ) = x1 + x2 = 6
A straight line is a convex set because a line
segment between any two points on it remain on it.
x* = ( x1 , x2 )*  (1.25,1.25)
f ( x1 , x2 )* = 3

x1 + x2 = 6  x2 = − x1 + 6 Any contour f<3 does not


intersect the equality
constraint;
Superimpose this Any contour f>3 intersects the
equality constraint at two
relation on top of the points.
contour plot for f(x1,x2).
➔The contour f=3 and the
equality constraint just touch
each other at the point x*.
1. f(x1,x2) must be minimized, and so we would like the solution
to be as close to the origin as possible; “Just touch”:
2. The solution must be on the thick line in the right-hand corner The two curves are tangent to one another at the solution point.
of the plot, since this line represents the equality constraint. 83 84

21
26-08-2025

Solving a convex program: graphical analysis Solving a convex program: graphical analysis

.
The two curves are tangent to one another at the solution point.
.
(
f ( x1 , x2 )* =  h(x1,x2 )* − c )
➔ The normal (gradient) Moving everything to the left:
vectors of the two curves, at
the solution (tangent) point, (
f ( x1 , x2 )* −  h(x1,x2 )* − c = 0 ) Alternately:
are parallel. ( )
f ( x1 , x2 )* +  c − h(x1,x2 )* = 0

This means the following Performing the gradient operation (taking


two vectors are parallel: derivatives with respect to x1 and x2) :
 
* *
  2x  
 x ( f ( x1 , x2 ) −  (h(x1,x2 ) − c ))
*
{ f ( x1 , x2 )*} =  x12 + x22 =  1 
 2 x2  0 
 1  = 
    ( f ( x , x ) −  (h(x ,x ) − c ))
*
* 1 0 
{h( x1 , x2 ) * − 6} =  x1 + x2 − 6 =  
1  x2 1 2 1 2


“Parallel” means that the two vectors have the same direction. We do not know that In this problem, we already know the solution, but what if we did not?

( )
they have the same magnitude. To account for this, we equate with a “multiplier” λ:
Then could we use the above equations to find the solution?
f ( x , x ) =  h(x ,x ) − c
1 2
*
1 2
*
85 86

Solving a convex program: analytical analysis Solving a convex program: analytical analysis
In this problem, we already know the solution, but what if we did not? Observation: The three equations are simply partial derivatives of the
f ( x1 , x2 ) −  (h(x1 ,x2 ) − c )
Then could we use the above equations to find the solution? function
 
*

 x ( f ( x1 , x2 ) −  (h(x1,x2 ) − c )) 0    
*

  =   x ( f ( x1 , x2 ) −  (h(x1,x2 ) − c ))
1

  ( f ( x , x ) −  (h(x ,x ) − c )) 0   1  0 
 x2 
  ( f ( x , x ) −  h(x ,x ) − c )  = 0
1 2 1 2

NO! Because we only have 2 equations, yet 3 unknowns: x1, x2, λ.  x2 1 2 1 2
  
  0
So we need another equation. Where do we get that equation? h( x1 , x2 ) − c
 
 
Recall our equality constraint: h(x1, x2)-c=0 . This must be satisfied! This is obviously true for the first two equations , but it is not so
Therefore: obviously true for the last one. But to see it, observe
 
*

 ( f ( x , x ) −  (h(x ,x ) − c )) 
x
1 2 1 2

 1  0  Three equations,
( f ( x1, x2 ) −  (h(x1,x2 ) − c )) = 0
  ( f ( x , x ) −  (h(x ,x ) − c )) = 0 three unknowns,

 x2 1 2 1 2
   we can solve.
  0  −h(x1,x2 ) + c = 0  h(x1,x2 ) = c
h( x1 , x2 ) − c
  87 88
 

22
26-08-2025

Formal approach to solving our problem Applying to our example


Define the Lagrangian function: Define the Lagrangian function:

L ( x1 , x2 ,  ) = f ( x1 , x2 ) −  (h(x1 ,x2 ) − c ) L ( x1 , x2 ,  ) = f ( x1 , x2 ) −  (h(x1,x2 ) − c )


In a convex programming problem, the “first-order conditions” for
finding the solution is given by
(
= x12 + x22 −  x1 + x2 − 6 )
L ( x1 , x2 ,  ) = 0 L ( x1 , x2 ,  ) = 0
  
L ( x1, x2 ,  ) = 0 L ( x1, x2 ,  ) = 0 L ( x1 , x2 ,  ) = 2 x1 −  = 0
x1  x1
L ( x,  ) = 0 x1

OR

L ( x1, x2 ,  ) = 0
Or more x OR

L ( x1, x2 ,  ) = 0 
L ( x1 , x2 ,  ) = 2 x2 −  = 0
x2 compactly x2 x2

 L ( x,  ) = 0 

L ( x1, x2 ,  ) = 0  
L ( x1, x2 ,  ) = 0 

(
L ( x1 , x2 ,  ) = − x1 + x2 − 6 = 0 )
where we have
used x=(x1, x2) A set of 3 linear equations and 3 unknowns;
89 we can write in the form of Ax=b. 90

Applying to our example

2 0 − 1  x1   0 
0 2 − 1  x  =  0 
  2   
1 1 0      6 
−1
 x1  2 0 − 1  0  1.2247 
  x2  = 0 2 − 1  0  = 1.2247 
Now, let’s go back to our example with a
   1 1 0   6  2.4495 nonlinear equality constraint.

91

23
26-08-2025

Example with nonlinear equality Example with nonlinear equality


.Non-convex because a line connecting two points in the min f ( x1 , x2 ) = x12 + x22 .
Solution:
set do not remain in the set. (see “notes” of this slide)
s.t. h(x1,x2 ) = 2 x1 x2 = 3 x* = ( x1 , x2 )*  (1.25,1.25)
f ( x1 , x2 )* = 3
3
2 x1x2 = 3  x2 = Any contour f<3 does not
2 x1
intersect the equality
constraint;
Superimpose this Any contour f>3 intersects the
equality constraint at two
relation on top of the points.
contour plot for f(x1,x2).
➔The contour f=3 and the
equality constraint just touch
each other at the point x*.

1. f(x1,x2) must be minimized, and so we would like the solution


to be as close to the origin as possible; “Just touch”:
2. The solution must be on the thick line in the right-hand corner The two curves are tangent to one another at the solution point.
of the plot, since this line represents the equality constraint. 93 94

Example with nonlinear equality Example with nonlinear equality


The two curves are tangent to one another at the solution point. This gives us the following two equations.
.
➔ The normal (gradient)  
*

vectors of the two curves, at  x ( f ( x1 , x2 ) −  (h(x1,x2 ) − c )) 0 
the solution (tangent) point,  1
 = 
  ( f ( x , x ) −  (h(x ,x ) − c )) 0 
are parallel.  x2 1 2 1 2

This means the following And we add the equality constraint to give 3 equations, 3 unknowns:
two vectors are parallel:
 
*

 x ( f ( x1 , x2 ) −  (h(x1,x2 ) − c ))
*
2x 
 *
{ f ( x1 , x2 )*} =  x12 + x22 =  1 
0  Three equations,
 2 x2   1 
2 x 
*   ( f ( x , x ) −  (h(x ,x ) − c )) = 0 three unknowns,
{h( x1 , x2 ) * −3} = 2 x1 x2 − 3 =  2 
*
 x2 1 2 1 2
   we can solve.
 2 x1    0
h( x1 , x2 ) − c
“Parallel” means that the two vectors have the same direction. We do not know that  
 
( )
they have the same magnitude. To account for this, we equate with a “multiplier” λ:
f ( x , x ) =  h(x ,x ) − c
1 2
*
1 2
*
95 96

24
26-08-2025

Example with nonlinear equality Example with nonlinear equality


Define the Lagrangian function:
L ( x1 , x2 ,  ) = f ( x1 , x2 ) −  (h(x1,x2 ) − c ) Our approach worked in this case, i.e., we
found a local optimal point that was also a
= x12 + x22 −  (2 x1 x2 − 3) global optimal point, but because it was
not a convex programming problem, we
L ( x1 , x2 ,  ) = 0 had no guarantee that this would happen.
  The conditions we established, below, we call first order conditions.
L ( x1, x2 ,  ) = 0 L ( x1 , x2 ,  ) = 2 x1 − 2x2 = 0
x1 x1 For convex programming problems, they are first order sufficient conditions to provide
  the global optimal point.
OR L ( x1, x2 ,  ) = 0 L ( x1 , x2 ,  ) = 2 x2 − 2x1 = 0
x2 x2 For nonconvex programming problems, they are first order necessary conditions to
  provide the global optimal point.
L ( x1, x2 ,  ) = 0 L ( x1 , x2 ,  ) = −(2 x1 x2 − 3) = 0
  
L ( x,  ) = 0
x1 = x2 =
3
= 1.2247
x
You can solve this algebraically to obtain and f=3 in

2

x1 = x 2 = −
3
= −1.2247
both cases
L ( x,  ) = 0
2 97 

25

You might also like