• The method of steepest descent uses only first derivatives (gradients) in selecting a suitable search direction.
○ This strategy is not always the most effective.
○ If higher derivatives are used, the resulting iterative algorithm may perform better than the steepest descent method.
• Newton’s method (sometimes called the Newton-Raphson method) uses first and second derivatives and indeed does perform better than
the steepest descent method if the initial point is close to the minimizer.
• The idea behind this method is as follows.
• Given a starting point, we construct a quadratic approximation to the objective function that matches the first and second derivative
values at that point.
• We then minimize the approximate (quadratic) function instead of the original objective function.
• We use the minimizer of the approximate function as the starting point in the next step and repeat the procedure iteratively.
• If the objective function is quadratic, then the approximation is exact, and the method yields the true minimizer in one step.
• If, on the other hand, the objective function is not quadratic, then the approximation will provide only an estimate of the position of
the true minimizer.
• Here is an illustration of Newton's method:
• We can obtain a quadratic approximation to the twice continuously differentiable objection function f: ℝn→ ℝ using the Taylor series expansion
of f about the current point x(k), neglecting terms of order three and higher. We obtain
• Applying the FONC to q yields
• If F(x(k)) > 0, then q achieves a minimum at
Example 1) Considering the starting point x(0) = [3, –1, 0, 1]⊤, use Newton’s method to minimize the Powell function:
• Observe that the kth iteration of Newton’s method can be written in two steps as:
○ Step 1 requires the solution of an n × n system of linear equations.
• Thus, an efficient method for solving systems of linear equations is essential when using Newton’s method.
• As in the one-variable case, Newton’s method can also be viewed as a technique for iteratively solving this equation:
• In this case F(x) is the Jacobian matrix of g at x; that is, F(x) is the n × n matrix whose (i,j) entry is (∂gi/∂xj)(x), i,j = 1,2,…, n.
Analysis of Newton’s Method:
• As in the one-variable case there is no guarantee that Newton’s algorithm heads in the direction of decreasing values of the objective function
if F(x(k)) is not positive definite.
• Moreover, even if F(x(k)) > 0, Newton’s method may not be a descent method; that is, it is possible that f(x(k+1)) ≥ f(x(k)).
○ For example, this may occur if our starting point x(0) is far away from the solution.
• Despite these drawbacks, Newton’s method has superior convergence properties when the starting point is near the solution.
○ The convergence analysis of Newton’s method when f is a quadratic function is straightforward.
• In fact, Newton’s method reaches the point x* such that ∇f(x*) = 0 in just one step starting from any initial point x(0).
• To analyze the convergence of Newton’s method in the general case, let {x(k)} be the Newton’s method sequence for minimizing a function f:
ℝn → ℝ. We show that {x(k)} converges to the minimizer x* with order of convergence at least 2.
Theorem 1)
Suppose that f ∈ C3 and x* ∈ ℝn is a point such that ∇f(x*) = 0 and F(x*) is invertible. Then, for all x(0) sufficiently close to x*, Newton’s
method is well-defined for all k and converges to x* with an order of convergence at least 2.
Note: In the Theorem 1, we did not state that x* is a local minimizer. For example, if x* is a local maximizes, then provided
that f ∈ C3 and F(x*) is invertible, Newton’s method would converge to x* if we start close enough to it.
• As stated in Theorem 1, Newton’s method has superior convergence properties if the starting point is near the solution.
○ However, the method is not guaranteed to converge to the solution if we start far away from it
• In fact, it may not even be well-defined because the Hessian may be singular.
• In particular, the method may not be a descent method;
• that is, it is possible that f(x(k+1) ≥ f(x(k)).
• Fortunately, it is possible to modify the algorithm such that the descent property holds.
Theorem 2)
Let {x(k)} be the sequence generated by Newton’s method for minimizing a given objective function f(x). If the Hessian F(x(k)) > 0 and
g(k) = ∇f(x(k)) ≠ 0, then the search direction
from x(k) to x(k+1) is a descent direction for f in the sense that there exists an ¯α > 0 such that for all α ∈ (0, ¯α),
• Theorem 2 motivates the following modification of Newton’s method:
○ that is, at each iteration, we perform a line search in the direction –F(x(k))–1g(k). By Theorem we conclude that the modified Newton’s
method has the descent property; that is,
• A drawback of Newton’s method is that evaluation of F(x(k)) for large n can be computationally expensive.
○ Furthermore, we have to solve the set of n linear equations F(x(k))d(k) = –g(k).
○ Another source of potential problems in Newton’s method arises from the Hessian matrix not being positive definite.
• You might see error for this while trying to use CPLEX/Gurobi to solve such problems as well!