0% found this document useful (0 votes)

31 views6 pages

Lecture 5

Uploaded by

mralreda99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views6 pages

Lecture 5

Uploaded by

mralreda99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

TMA947 / MMG621 – Nonlinear optimization Lecture 5

TMA947 / MMG621 — Nonlinear optimization

Lecture 5 — Uncontrained optimization algorithms

Emil Gustavsson, Zuzana Nedělková

November 6, 2017

[Minor revision: Axel Ringh - Spetember, 2024]

Consider the unconstrained optimization problem to

minimize
n
f (x), (1)
x∈R

where f ∈ C 0 on Rn (f is continuous). Mostly, we assume that f ∈ C 1 holds (f is continuously

differentiable), sometimes even C 2 . The choice of the algorithm depends on the size of the prob-
lem, availability of ∇f (x) and ∇2 f (x), convexity of f and if the goal is to find a local or the global
minimum.

Most algorithms for unconstrained optimization problems are what we call line search type algo-
rithms.
Definition. Line search type algorithm

Step 0: Starting point x0 ∈ Rn . Let k := 0.

Step 1: Find search direction pk ∈ Rn
Step 2: Perform line search, i.e., find αk > 0 such that f (xk + αk pk ) < f (xk )
Step 3: Let xk+1 := xk + αk pk .
Step 4: If termination criteria is fulfilled then stop! Otherwise, let k := k + 1 and go to Step 1.

p 5

4
f
3

w 1

−1
x
−2
−5
−4
−3 −3
−2
−1
0
1 −4
2
3
4 −5
a c
5

Most algorithms we consider are inherently local, meaning that the search direction pk is only
based on the information at the current point xk , that is, f (xk ), ∇f (xk ), and ∇2 f (xk ).

Think of a near-sighted mountain climber. The climber is in a deep fog and can only check his or
her barometer for the height and feel the steepness of the slope under her feet.

1
TMA947 / MMG621 – Nonlinear optimization Lecture 5

Step 1: Search directions

Vector pk is a descent direction at xk if f (xk + αpk ) < f (xk ) for all α ∈ (0, δ] for some δ > 0.

Let f ∈ C 1 in some neighborhood of xk ∈ Rn , if ∇f (xk ) ̸= 0, then pk = −∇f (xk ) is a descent

direction for f at xk (follows from optimality conditions). This search step is called steepest descent
direction because it solves the problem to

minimize ∇f (xk )T p.
p∈Rn :∥p∥=1

Let Q ∈ Rn×n be an arbitrary symmetric, positive definite matrix. Then pk = −Q∇f (xk ) is a
descent direction for f at xk , because

∇f (xk )T pk = −∇f (xk )T Q∇f (xk ) < 0,

due to the positive definiteness of Q.

Examples:

– Steepest descent: Q = I,
– Newton’s method: Q = [∇2 f (xk )]−1 .

We will now derive Newton’s method. To do so, we need to assume that f ∈ C 2 . We also first
assume that ∇2 f (x) is positive definite. A second-order Taylor approximation is then:
1
f (xk + p) − f (xk ) ≈ ∇f (xk )T p + pT ∇2 f (xk )p =: φxk (p)
2
We now try to minimize this approximation by setting the gradient of φxk (p) to zero:

∇p φxk (p) = ∇f (xk ) + ∇2 f (xk )p = 0 ⇔ ∇2 f (xk )p = −∇f (xk )

Now by choosing the vector fulfilling this we obtain pk = −[∇2 f (xk )]−1 ∇f (xk ) as the search
direction. When n = 1, we get that pk = −f ′ (xk )/f ′′ (xk ).

When the Hessian ∇2 f (xk ) is positive definite this search direction is a descent direction. But
when ∇2 f (xk ) is negative definite (may be also non invertible), the search direction is an ascent
direction, meaning that Newton’s method does differentiate between minimization and maxi-
mization problem. The solution to this problem is to modify ∇2 f (xk ) by adding a diagonal ma-
trix γI such that (∇2 f (xk ) + γI) is positive definite (this can always be done, why?). This method
is called the Levenberg-Marquardt modification. We thus take as search direction
−1
pk = − ∇2 f (xk ) + γI

∇f (xk ).

Note that

– Steepest descent: γ = ∞,
– Newton’s method: γ = 0.

2
TMA947 / MMG621 – Nonlinear optimization Lecture 5

What happens when we can not compute ∇2 f (xk )? Try to approximate the Hessian in some way
choosing approximate matrix B k . From Taylor expansion for ∇f (xk ) we have that

∇2 f (xk )(xk − xk−1 ) ≈ ∇f (xk ) − ∇f (xk−1 )

so the approximate matrix B k has to fulfill

B k (xk − xk−1 ) = ∇f (xk ) − ∇f (xk−1 ).

Many different choices of B k exist, and they lead to what is called quasi-Newton methods.

To summarize:

Steepest descent: pk = −∇f (xk )

2
Netwon’s method: ∇ f (xk )pk = −∇f (xk )
2
Levenberg-Marquardt: (∇ f (xk ) + γI)pk = −∇f (xk )
Quasi-Newton: B k pk = −∇f (xk ).

Step 2: Line search

In each iteration one would like to solve

minimize φ(α) := f (xk + αpk ).

α≥0

The optimality conditions for the problem are

φ′ (α∗ ) ≥ 0,
α∗ φ′ (α∗ ) = 0,
α∗ ≥ 0.

These conditions state that if α∗ > 0, then φ′ (α∗ ) = 0, which implies that

∇f (xk + α∗ pk )T pk = 0,

meaning that the search direction pk is orthogonal to the gradient of f at xk + α∗ pk .

p 5

4
f
3

w 1

−1
x
−2
−5
−4
−3 −3
−2
−1
0
1 −4
2
3
4 −5
a c
5

3
TMA947 / MMG621 – Nonlinear optimization Lecture 5

However, solving the line search problem to optimality is unnecessary. The optimal solution to
the original problem lies elsewhere anyway. Examples of methods to choose step lengths αk

– Interpolation: Use f (xk ), ∇f (xk ), and ∇f (xk )T pk to approximate φ = f (xk + αpk ) quadrat-
ically. Then minimize this approximation of φ analytically.
– Newton’s method: Repeat improvements from a quadratic approximation: α = α−φ′ (α)/φ′′ (α)
– Golden section: Derivative-free method which shrinks an interval wherein a solution to
φ′ (α) = 0 lies.

We will often use what is denoted as the Armijo rule. The idea is to choose a step length α which
provides sufficient decrease in f . We have that

f (xk + αpk ) ≈ f (xk ) + α∇f (xk )T pk ,

for very small values of α > 0, meaning that we predict that the objective function will decrease
with α∇f (xk )T pk if we move a step length α in the direction of pk . Now this might be too
optimistic, and we will therefore accept the step length if the actual decrease is at least a fraction
µ (µ is small, typically µ ∈ [0.001, 0.01]) of the predicted decrease, i.e., we will accept α if

f (xk + αpk ) − f (xk ) ≤ µα∇f (xk )T pk ,

or equivalently, if
φ(α) − φ(0) ≤ µαφ′ (0).
We usually start with α = 1. If this is not fulfilled, then choose α := α/2.

r a

b c

Figure 1: The interval (R) accepted by the Armijo step length rule

4
TMA947 / MMG621 – Nonlinear optimization Lecture 5

Convergence

In order to state a convergence result for the algorithm, we make an additional assumption for
the search directions. We need the directions pk to fulfill

∇f (xk )T pk
− ≥ s1 , ∥pk ∥ ≥ s2 ||∇f (xk ||, and ∥pk ∥ ≤ M (2)
∥∇f (xk )∥ · ∥pk ∥

for some s1 , s2 , M > 0, where the first inequality makes the angle between pk and ∇f (xk ) stay
between 0 and π/2, but not too close to π/2. The second inequality makes sure that the only
case when pk can be zero is when the gradient is zero. These two conditions guarantee a certain
descent quality.
Theorem (convergence of unconstrained algorithm). Suppose f ∈ C 1 and for the starting point x0
the level set {x ∈ Rn | f (x) ≤ f (x0 )} is bounded. Consider the iterative algorithm described above.
Suppose that for all k, pk fulfills (2) and αk is chosen according to the Armijo rule. Then

a) the sequence {xk } is bounded,

b) the sequence {f (xk )} is descending and lower bounded, and

c) every limit point of {xk } is a stationary point.

Proof. See Theorem 11.4 in the book.

If we add the assumption that f is a convex function, then we can show that

optimum exists ⇐⇒ {xk } converges to an optimal solution.

Step 4: Termination criteria

We can not terminate the algorithm when ∇f (xk ) = 0, since this rarely happens. We need to have
some tolerance level. Three examples are

a) ∥∇f (xk )∥ ≤ ε1 (1 + |f (xk )|), where ε1 > 0 is small.

b) f (xk−1 ) − f (xk ) ≤ ε2 (1 + |f (xk )|), where ε2 > 0 is small.
c) ∥xk − xk−1 ∥ ≤ ε3 (1 + ∥xk ∥), where ε3 > 0 is small.

Can also use the max-norm ∥ · ∥∞ instead.

5
TMA947 / MMG621 – Nonlinear optimization Lecture 5

A note on trust region methods

Trust region methods use a quadratic approximation of the function around the current iterate
xk , avoid a line search but instead bound the length of the search direction. Let

1
φxk (p) := f (xk ) + ∇f (xk )T p + pT ∇2 f (xk )p.
2
Since this is a local approximation, we restrict our approximation to a trust region in the neighbor-
hood of xk , i.e., we trust the model in the region where ∥p∥ ≤ ∆k . We then solve the problem
to

minimize φxk (p),

subject to ∥p∥ ≤ ∆k .

and let the solution be pk . Then we update our iterate as xk+1 = xk + pk . We also update the trust
region parameter ∆k depending on the progress so far (actual reduction/predicted reduction).
The method is robust and possess strong convergence. More detailed information about trust
region methods can be found in the book on pages 301–302.

A note on black-box functions

In some cases the value of the objective function f (x) is given through some unknown simulation
procedure. This implies that we do not have a clear representation of the gradient of the objective
function. In some cases, we can perform numerical differentiation and approximate the partial
derivatives as, e.g.,
∂f (x) f (x + αei ) − f (x)
≈ ,
∂xi α
where ei = (0, . . . , 0, 1, 0, . . . , 0)T is the unit vector in Rn .

If the simulation is not accurate, we get a bas derivative information. We can use derivative-free
methods instead. These try to build a model fˆ of the objective function f from evaluating the
objective function at some specific test points and optimize the model fˆ instead of the function f .

Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
Clnote Sept24
No ratings yet
Clnote Sept24
24 pages
Lec4 Gradient Method Revise
No ratings yet
Lec4 Gradient Method Revise
33 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
4 Pattern Directions, 21-08-2024
No ratings yet
4 Pattern Directions, 21-08-2024
58 pages
Optimization 2
No ratings yet
Optimization 2
40 pages
Nocedal - Wright CH - 02-02
No ratings yet
Nocedal - Wright CH - 02-02
12 pages
Lecture 5 Si416 2025
No ratings yet
Lecture 5 Si416 2025
21 pages
Chapter 2 - Final
No ratings yet
Chapter 2 - Final
11 pages
Clnote Oct12
No ratings yet
Clnote Oct12
25 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
38 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Chương 9
No ratings yet
Chương 9
12 pages
Gradient and Newton Optimization
No ratings yet
Gradient and Newton Optimization
42 pages
CS-6777 Liu Abs
100% (1)
CS-6777 Liu Abs
103 pages
Unconstrained Function Optimization
No ratings yet
Unconstrained Function Optimization
30 pages
Optimization for Engineers
No ratings yet
Optimization for Engineers
166 pages
First-order Methods in ML Optimization
No ratings yet
First-order Methods in ML Optimization
169 pages
Part3 1
No ratings yet
Part3 1
15 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
Other Nonlinear Regression Methods For Algebraic Models
No ratings yet
Other Nonlinear Regression Methods For Algebraic Models
17 pages
An Acceleration of Gradient Descent Algorithm With Backtracking For Unconstrained Optimization. Numer. Algor. 42, 63-73 (2006)
No ratings yet
An Acceleration of Gradient Descent Algorithm With Backtracking For Unconstrained Optimization. Numer. Algor. 42, 63-73 (2006)
11 pages
Nonlinear Programming (Concepts, Algorithms, and Applications To Chemical Processes) - 3. Newton-Type Methods For Unconstrained Optimization (2010)
No ratings yet
Nonlinear Programming (Concepts, Algorithms, and Applications To Chemical Processes) - 3. Newton-Type Methods For Unconstrained Optimization (2010)
23 pages
Bologna 07
No ratings yet
Bologna 07
315 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Algorithm for Non-linear Equations
No ratings yet
Algorithm for Non-linear Equations
8 pages
(1.5.2) Unconstrained Nonlinear Programming
No ratings yet
(1.5.2) Unconstrained Nonlinear Programming
25 pages
Process Optimization
100% (1)
Process Optimization
70 pages
Chapter 9 Lecture Notes
No ratings yet
Chapter 9 Lecture Notes
3 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Optimization Techniques Lecture
No ratings yet
Optimization Techniques Lecture
37 pages
Preguntas Del Examen
No ratings yet
Preguntas Del Examen
8 pages
Mit18 S096iap23 Lec06
No ratings yet
Mit18 S096iap23 Lec06
9 pages
Opt Lec 10
No ratings yet
Opt Lec 10
16 pages
Lecture8 UnconstrainedII 2023
No ratings yet
Lecture8 UnconstrainedII 2023
57 pages
Unconstrained Multivariable Optimization
No ratings yet
Unconstrained Multivariable Optimization
42 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
Newton-Raphson Optimization: Steve Kroon
No ratings yet
Newton-Raphson Optimization: Steve Kroon
4 pages
Constrained and Unconstrained Optimization: Carlos Hurtado
No ratings yet
Constrained and Unconstrained Optimization: Carlos Hurtado
42 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
6 OneD Unconstrained Opt
No ratings yet
6 OneD Unconstrained Opt
29 pages
Gradient and Newton's Methods Lecture
No ratings yet
Gradient and Newton's Methods Lecture
14 pages
Midterm 1 Notes
No ratings yet
Midterm 1 Notes
46 pages
ECOM 6302: Engineering Optimization: Chapter Three
100% (1)
ECOM 6302: Engineering Optimization: Chapter Three
56 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
Maximum Slope Method
No ratings yet
Maximum Slope Method
14 pages
Xu2001 Minimax
No ratings yet
Xu2001 Minimax
13 pages
Lec - 3 - Notes On Newton and Quasi Netwon Method-23-31
No ratings yet
Lec - 3 - Notes On Newton and Quasi Netwon Method-23-31
9 pages
Elimination Methods
No ratings yet
Elimination Methods
34 pages
Optimumengineeringdesign Day5
No ratings yet
Optimumengineeringdesign Day5
84 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Grade 7 Mathematics Book
No ratings yet
Grade 7 Mathematics Book
60 pages
Number Theory Course Overview
No ratings yet
Number Theory Course Overview
643 pages
Math Problem and Solutions
No ratings yet
Math Problem and Solutions
33 pages
Optimpid: A Matlab Interface For Optimum Pid Controller Design
No ratings yet
Optimpid: A Matlab Interface For Optimum Pid Controller Design
6 pages
Transient Response of Simple Control System
No ratings yet
Transient Response of Simple Control System
16 pages
ISOMORPHISM Modern Algebra
No ratings yet
ISOMORPHISM Modern Algebra
18 pages
Mathematics Grade 9 Term2 Test
90% (10)
Mathematics Grade 9 Term2 Test
4 pages
Design Optimization (ME41613) + Engineering Design Optimization (ME60079)
No ratings yet
Design Optimization (ME41613) + Engineering Design Optimization (ME60079)
16 pages
Logarithm Worksheet For 9th Class PDF Special
No ratings yet
Logarithm Worksheet For 9th Class PDF Special
1 page
Lec-2 LA Linear Algebra Howard Anton Lectures Slides For Student
No ratings yet
Lec-2 LA Linear Algebra Howard Anton Lectures Slides For Student
49 pages
Chemistry
No ratings yet
Chemistry
198 pages
M SC Computer Science Syllabus&Scheme
No ratings yet
M SC Computer Science Syllabus&Scheme
66 pages
05 Time Response Analysis
No ratings yet
05 Time Response Analysis
41 pages
Tancet - Mca Syllabus
No ratings yet
Tancet - Mca Syllabus
5 pages
Conversion Meter Feet PDF
No ratings yet
Conversion Meter Feet PDF
1 page
Numerical Solutions To Civil Engineers Problem (Lecture) : Manuel S. Enverga University Foundation College of Engineering
No ratings yet
Numerical Solutions To Civil Engineers Problem (Lecture) : Manuel S. Enverga University Foundation College of Engineering
3 pages
Cambridge Assessment International Education: Mathematics 0580/41 October/November 2017
No ratings yet
Cambridge Assessment International Education: Mathematics 0580/41 October/November 2017
8 pages
Mca 1
No ratings yet
Mca 1
1 page
Stat Notes
No ratings yet
Stat Notes
56 pages
Structural Topology Optimization - Moving Beyond Linear Elastic de
No ratings yet
Structural Topology Optimization - Moving Beyond Linear Elastic de
13 pages
Class 10 Math Sample Paper
No ratings yet
Class 10 Math Sample Paper
12 pages
Primitive Roots & Quadratic Non-Residues
No ratings yet
Primitive Roots & Quadratic Non-Residues
6 pages
Awarded Ebook, PreCalculus II, With Videos and Animations
100% (8)
Awarded Ebook, PreCalculus II, With Videos and Animations
194 pages
Class XI Applied Math Sample Paper
No ratings yet
Class XI Applied Math Sample Paper
15 pages
United States Patent: (10) Patent No.: US 8,600,710 B2
No ratings yet
United States Patent: (10) Patent No.: US 8,600,710 B2
27 pages
LG 04
No ratings yet
LG 04
21 pages
Module 3
No ratings yet
Module 3
5 pages
Guy Brousseau, Nadine Brousseau, Virginia Warfield (Auth.) Teaching Fractions Through Situations - A Fundamental Experiment 2014 PDF
No ratings yet
Guy Brousseau, Nadine Brousseau, Virginia Warfield (Auth.) Teaching Fractions Through Situations - A Fundamental Experiment 2014 PDF
226 pages
Supervision: Concepts and Skill-Building 11th Edition Samuel Certo Instant Download
50% (2)
Supervision: Concepts and Skill-Building 11th Edition Samuel Certo Instant Download
75 pages

Lecture 5

Uploaded by

Lecture 5

Uploaded by

TMA947 / MMG621 – Nonlinear optimization Lecture 5

TMA947 / MMG621 — Nonlinear optimization

Lecture 5 — Uncontrained optimization algorithms

[Minor revision: Axel Ringh - Spetember, 2024]

Consider the unconstrained optimization problem to

where f ∈ C 0 on Rn (f is continuous). Mostly, we assume that f ∈ C 1 holds (f is continuously

Step 0: Starting point x0 ∈ Rn . Let k := 0.

Step 1: Search directions

Let f ∈ C 1 in some neighborhood of xk ∈ Rn , if ∇f (xk ) ̸= 0, then pk = −∇f (xk ) is a descent

∇f (xk )T pk = −∇f (xk )T Q∇f (xk ) < 0,

due to the positive definiteness of Q.

∇p φxk (p) = ∇f (xk ) + ∇2 f (xk )p = 0 ⇔ ∇2 f (xk )p = −∇f (xk )

∇2 f (xk )(xk − xk−1 ) ≈ ∇f (xk ) − ∇f (xk−1 )

so the approximate matrix B k has to fulfill

B k (xk − xk−1 ) = ∇f (xk ) − ∇f (xk−1 ).

Steepest descent: pk = −∇f (xk )

Step 2: Line search

In each iteration one would like to solve

minimize φ(α) := f (xk + αpk ).

The optimality conditions for the problem are

meaning that the search direction pk is orthogonal to the gradient of f at xk + α∗ pk .

f (xk + αpk ) ≈ f (xk ) + α∇f (xk )T pk ,

f (xk + αpk ) − f (xk ) ≤ µα∇f (xk )T pk ,

a) the sequence {xk } is bounded,

b) the sequence {f (xk )} is descending and lower bounded, and

c) every limit point of {xk } is a stationary point.

Proof. See Theorem 11.4 in the book.

optimum exists ⇐⇒ {xk } converges to an optimal solution.

Step 4: Termination criteria

a) ∥∇f (xk )∥ ≤ ε1 (1 + |f (xk )|), where ε1 > 0 is small.

Can also use the max-norm ∥ · ∥∞ instead.

A note on trust region methods

minimize φxk (p),

A note on black-box functions

You might also like