Newton's method
From Wikipedia, the free encyclopedia
In numerical analysis, Newton's method (also known as the Newton–Raphson method), named after Isaac Newton and Joseph
Raphson, is perhaps the best known method for finding successively better approximations to the zeroes (or roots) of a real-
valued function. Newton's method can often converge remarkably quickly, especially if the iteration begins "sufficiently near" the desired
root. Just how near "sufficiently near" needs to be, and just how quickly "remarkably quickly" can be, depends on the problem. This is
discussed in detail below. Unfortunately, when iteration begins far from the desired root, Newton's method can easily lead an unwary user
astray with little warning. Thus, good implementations of the method embed it in a routine that also detects and perhaps overcomes
possible convergence failures.
Given a function ƒ(x) and its derivative ƒ '(x), we begin with a first guess x0. Provided the function is reasonably well-behaved a better
approximation x1 is
The process is repeated until a sufficiently accurate value is reached:
An important and somewhat surprising application is Newton–Raphson division, which can be used to quickly find
the reciprocal of a number using only multiplication and subtraction.
The algorithm is first in the class of Householder's methods, succeeded by Halley's method.
Contents
[hide]
• 1 Description of the method
• 2 Application to minimization and maximization problems
• 3 History
• 4 Practical considerations
• 5 Analysis
• 6 Examples
o 6.1 Square root of a number
o 6.2 Solution of a non-polynomial equation
• 7 Counterexamples
o 7.1 Bad starting points
7.1.1 Iteration point is stationary
7.1.2 Starting point enters a cycle
o 7.2 Derivative issues
7.2.1 Derivative does not exist at root
7.2.2 Discontinuous derivative
o 7.3 Non-quadratic convergence
7.3.1 Zero derivative
7.3.2 No second derivative
• 8 Generalizations
o 8.1 Complex functions
o 8.2 Nonlinear systems of equations
o 8.3 Nonlinear equations in a Banach space
• 9 See also
• 10 References
• 11 External links
[edit]Description of the method
An illustration of one iteration of Newton's method (the function ƒ is shown in blue and the tangent line is in red). We see that xn+1 is a better
approximation than xn for the root xof the function f.
The idea of the method is as follows: one starts with an initial guess which is reasonably close to the true root, then the
function is approximated by its tangent line(which can be computed using the tools of calculus), and one computes the x-
intercept of this tangent line (which is easily done with elementary algebra). This x-intercept will typically be a better
approximation to the function's root than the original guess, and the method can be iterated.
Suppose ƒ : [a, b] → R is a differentiable function defined on the interval [a, b] with values in the real numbers R. The
formula for converging on the root can be easily derived. Suppose we have some current approximation xn. Then we can
derive the formula for a better approximation, xn+1 by referring to the diagram on the right. We know from the definition of the
derivative at a given point that it is the slope of a tangent at that point.
That is
Here, f ' denotes the derivative of the function f. Then by simple algebra we can derive
We start the process off with some arbitrary initial value x0. (The closer to the zero, the better. But, in the
absence of any intuition about where the zero might lie, a "guess and check" method might narrow the
possibilities to a reasonably small interval by appealing to the intermediate value theorem.) The method will
usually converge, provided this initial guess is close enough to the unknown zero, and that ƒ'(x0) ≠ 0.
Furthermore, for a zero of multiplicity 1, the convergence is at least quadratic (see rate of convergence) in
a neighbourhood of the zero, which intuitively means that the number of correct digits roughly at least
doubles in every step. More details can be found in the analysis sectionbelow.
[edit]Application to minimization and maximization problems
Main article: Newton's method in optimization
Newton's method can also be used to find a minimum or maximum of a function. The derivative is zero at a
minimum or maximum, so minima and maxima can be found by applying Newton's method to the derivative.
The iteration becomes:
[edit]History
Newton's method was described by Isaac Newton in De analysi per aequationes numero
terminorum infinitas (written in 1669, published in 1711 by William Jones) and in De metodis
fluxionum et serierum infinitarum (written in 1671, translated and published as Method of Fluxions in
1736 by John Colson). However, his description differs substantially from the modern description
given above: Newton applies the method only to polynomials. He does not compute the successive
approximations xn, but computes a sequence of polynomials and only at the end, he arrives at an
approximation for the root x. Finally, Newton views the method as purely algebraic and fails to notice
the connection with calculus. Isaac Newton probably derived his method from a similar but less
precise method by Vieta. The essence of Vieta's method can be found in the work of the Persian
mathematician, Sharaf al-Din al-Tusi, while his successor Jamshīd al-Kāshī used a form of Newton's
method to solve xP − N = 0to find roots of N (Ypma 1995). A special case of Newton's method for
calculating square roots was known much earlier and is often called theBabylonian method.
Newton's method was used by 17th century Japanese mathematician Seki Kōwa to solve single-
variable equations, though the connection with calculus was missing.
Newton's method was first published in 1685 in A Treatise of Algebra both Historical and
Practical by John Wallis. In 1690, Joseph Raphsonpublished a simplified description in Analysis
aequationum universalis. Raphson again viewed Newton's method purely as an algebraic method
and restricted its use to polynomials, but he describes the method in terms of the successive
approximations xn instead of the more complicated sequence of polynomials used by Newton.
Finally, in 1740, Thomas Simpson described Newton's method as an iterative method for solving
general nonlinear equations using fluxional calculus, essentially giving the description above. In the
same publication, Simpson also gives the generalization to systems of two equations and notes that
Newton's method can be used for solving optimization problems by setting the gradient to zero.
Arthur Cayley in 1879 in The Newton-Fourier imaginary problem was the first who noticed the
difficulties in generalizing the Newton's method to complex roots of polynomials with degree greater
than 2 and complex initial values. This opened the way to the study of the theory of iterations of
rational functions.
[edit]Practical considerations
Newton's method is an extremely powerful technique—in general the convergence is quadratic: the
error is essentially squared at each step (which means that the number of accurate digits roughly
doubles in each step). However, there are some difficulties with the method.
1. Newton's method requires that the derivative be calculated directly. In most
practical problems, the function in question may be given by a long and complicated
formula, and hence an analytical expression for the derivative may not be easily
obtainable. In these situations, it may be appropriate to approximate the derivative by
using the slope of a line through two points on the function. In this case, the Secant
method results. This has slightly slower convergence than Newton's method but does
not require the existence of derivatives.
2. If the initial value is too far from the true zero, Newton's method may fail to
converge. For this reason, Newton's method is often referred to as a local technique.
Most practical implementations of Newton's method put an upper limit on the number of
iterations and perhaps on the size of the iterates.
3. If the derivative of the function is not continuous the method may fail to converge.
4. It is clear from the formula for Newton's method that it will fail in cases where the
derivative is zero. Similarly, when the derivative is close to zero, the tangent line is
nearly horizontal and hence may "shoot" wildly past the desired root.
5. If the root being sought has multiplicity greater than one, the convergence rate is
merely linear (errors reduced by a constant factor at each step) unless special steps are
taken. When there are two or more roots that are close together then it may take many
iterations before the iterates get close enough to one of them for the quadratic
convergence to be apparent.
6. Newtons method works best for functions with low curvature.
Since the most serious of the problems above is the possibility of a failure of convergence, Press et
al. (1992) present a version of Newton's method that starts at the midpoint of an interval in which the
root is known to lie and stops the iteration if an iterate is generated that lies outside the interval.
Developers of large scale computer systems involving root finding tend to prefer the secant
method over Newton's method because the use of a difference quotient in place of the derivative in
Newton's method implies that the additional code to compute the derivative need not be maintained.
In practice, the advantages of maintaining a smaller code base usually outweigh the superior
convergence characteristics of Newton's method.
[edit]Analysis
Suppose that the function ƒ has a zero at α, i.e., ƒ(α) = 0.
If f is continuously differentiable and its derivative is nonzero at α, then there exists
a neighborhood of α such that for all starting values x0 in that neighborhood, the sequence {xn}
will converge to α.
If the function is continuously differentiable and its derivative is not 0 at α and it has a second
derivative at α then the convergence is quadratic or faster. If the second derivative is not 0 at α then
the convergence is merely quadratic.
If the derivative is 0 at α, then the convergence is usually only linear. Specifically, if ƒ is twice
continuously differentiable, ƒ '(α) = 0 andƒ ''(α) ≠ 0, then there exists a neighborhood of α such that
for all starting values x0 in that neighborhood, the sequence of iterates converges linearly,
with rate log10 2 (Süli & Mayers, Exercise 1.6). Alternatively if ƒ '(α) = 0 and ƒ '(x) ≠ 0 for x ≠ 0, x in
a neighborhood U of α, α being a zero of multiplicity r, and if ƒ ∈ Cr(U) then there exists a
neighborhood of α such that for all starting values x0 in that neighborhood, the sequence of iterates
converges linearly.
However, even linear convergence is not guaranteed in pathological situations.
In practice these results are local and the neighborhood of convergence are not known a priori, but
there are also some results on global convergence, for instance, given a right neighborhood U+ of α,
if f is twice differentiable in U+ and if , in U+, then, for
eachx0 in U+ the sequence xk is monotonically decreasing to α.
[edit]Examples
[edit]Square root of a number
Consider the problem of finding the square root of a number. There are many methods of computing
square roots, and Newton's method is one.
For example, if one wishes to find the square root of 612, this is equivalent to finding the solution to
The function to use in Newton's method is then,
with derivative,
With an initial guess of 10, the sequence given by Newton's method is
Where the correct digits are underlined. With only a few iterations one
can obtain a solution accurate to many decimal places.
[edit]Solution of a non-polynomial equation
Consider the problem of finding the positive number x with cos(x)
= x3. We can rephrase that as finding the zero of f(x) = cos(x) − x3.
We havef'(x) = −sin(x) − 3x2. Since cos(x) ≤ 1 for all x and x3 > 1
for x > 1, we know that our zero lies between 0 and 1. We try a
starting value of x0 = 0.5. (Note that a starting value of 0 will lead to
an undefined result, showing the importance of using a starting point
that is close to the zero.)
The correct digits are underlined in the above example. In
particular, x6 is correct to the number of decimal places given.
We see that the number of correct digits after the decimal
point increases from 2 (for x3) to 5 and 10, illustrating the
quadratic convergence.
[edit]Counterexamples
Newton's method is only guaranteed to converge if certain
conditions are satisfied, so depending on the shape of the
function and the starting point it may or may not converge.
[edit]Bad starting points
In some cases the conditions on function necessary for
convergence are satisfied, but the point chosen as the initial
point is not in the interval where the method converges. In
such cases a different method, such as bisection, should be
used to obtain a better estimate for the zero to use as an
initial point.
[edit]Iteration point is stationary
Consider the function:
It has a maximum at x=0 and solutions of f(x) = 0
at x = ±1. If we start iterating from the stationary
point x0=0 (where the derivative is zero), x1will be
undefined:
The same issue occurs if, instead of the
starting point, any iteration point is stationary.
Even if the derivative is not zero but is small,
the next iteration will be far away from the
desired zero.
[edit]Starting point enters a cycle
The tangent lines of x3 - 2x + 2 at 0 and 1 intersect
the x-axis at 1 and 0 respectively, illustrating why
Newton's method oscillates between these values for
some starting points.
For some functions, some starting points may
enter an infinite cycle, preventing
convergence. Let
and take 0 as the starting point. The
first iteration produces 1 and the
second iteration returns to 0 so the
sequence will oscillate between the
two without converging to a root. In
general, the behavior of the sequence
can be very complex. (See Newton
fractal.)
[edit]Derivative issues
If the function is not continuously
differentiable in a neighborhood of the
root then it is possible that Newton's
method will always diverge and fail,
unless the solution is guessed on the
first try.
[edit]Derivative does not exist
at root
A simple example of a function where
Newton's method diverges is the cube
root, which is continuous and infinitely
differentiable with continuity, except
for x = 0, where its derivative is
undefined (this, however, does not
affect the algorithm, since it will never
require the derivative if the solution is
already found):
For any iteration point xn, the
next iteration point will be:
The algorithm
overshoots the solution
and lands on the other
side of the y-axis,
farther away than it
initially was; applying
Newton's method
actually doubles the
distances from the
solution at each
iteration.
In fact, the iterations
diverge to infinity for
every f(x) = | x | α,
where
In the limiting case
of (square
root), the iterations will
oscillate indefinitely
between points x0 and
−x0, so they do not
converge in this case
either.
[edit]Discontinuou
s derivative
If the derivative is not
continuous at the root,
then convergence may
fail to occur in any
neighborhood of the
root. Consider the
function
Its derivative
is:
Within
any
neighb
orhood
of the
root,
this
derivati
ve
keeps
changi
ng sign
as x ap
proach
es 0
from
the
right
(or
from
the left)
while f(
x) ≥ x −
x2 > 0
for
0<x<
1.
So f(x)/
f'(x) is
unboun
ded
near
the
root,
and
Newton
's
method
will
diverge
almost
everyw
here in
any
neighb
orhood
of it,
even
though:
th
fu
ct
io
is
di
ff
nt
ia
bl
th
u
s
nt
in
s)
ry
e;
th
ri
at
iv
at
th
ot
is
n
o;
fi
in
fi
ni
te
ly
di
ff
nt
ia
bl
pt
at
th
ot
;
th
ri
at
iv
is
in
ei
d
of
th
ot
nl
ik
f(
x)
/f'
x)
).
[edit]N
on-
quad
ratic
conv
erge
nce
In
some
cases
the
iterates
conver
ge but
do not
conver
ge as
quickly
as
promis
ed. In
these
cases
simpler
method
conver
ge just
as
quickly
as
Newton
's
method
[edit]Z
ero
deriva
tive
If the
first
derivati
ve is
zero at
the
root,
then
conver
gence
will not
be
quadrat
ic.
Indeed,
let
t
e
n
e
f
l
e
r
e
"
"
d
Then
the first
few
iterates
starting
at x0 = 1
are 1,
0.50025
0376,
0.25106
2828,
0.12750
7934,
0.06767
1976,
0.04122
4176,
0.03274
1218,
0.03164
2362; it
takes
six
iteration
s to
reach a
point
where
the
converg
ence
appears
to be
quadrati
c.
[edit]N
o
secon
d
derivat
ive
If there
is no
second
derivativ
e at the
root,
then
converg
ence
may fail
to be
quadrati
c.
Indeed,
let
Then
And
except when wher
it is undefined. Given ,
which has approximately 4/3 t
many bits of precision as
This is less than the 2 times a
which would be required for q
convergence. So the converge
Newton's method (in this case
quadratic, even though: the fu
continuously differentiable eve
the derivative is not zero at th
and is infinitely differentiab
at the desired root.