KEMBAR78
Errors and Computer Arithmetic | PDF | Numbers | Arithmetic
0% found this document useful (0 votes)
13 views7 pages

Errors and Computer Arithmetic

Year 1 computing! Basic basic stuff.

Uploaded by

epsilon245779
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views7 pages

Errors and Computer Arithmetic

Year 1 computing! Basic basic stuff.

Uploaded by

epsilon245779
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

MATH243: Errors and Computer Arithmetic

Week 2 Lectures - PS
Recall from our last lecture the possible sources of errors in a numerical computation, namely

1. Modeling errors

2. Human blunders and mistakes


3. Physical measurement errors
4. Machine representation and arithmetic errors
5. Mathematical approximation errors

and the consequences of errors: Errors, such as round-off errors get propagated in numerical computations
resulting in:

1. Loss-of-significance errors
2. Noise in function evaluation (uncertainty)
3. • underflow errors (numbers are too small). The number is taken as zero by default), and
• overflow errors (number too large). For overflow errors, the number has a value ±∞ or N aN .

Our focus here is on machine representation and arithmetic errors.

1 Machine representation and arithmetic errors


We count using the decimal number system (i.e. base 10), but computers use binary arithmetic (base 2).
Other bases, e.g. the octal (base 8) hexadecimal(base 16) number systems are also used.
Binary arithmetic - each number is represented as a binary number (that is, a finite sum of integer powers
of 2).

1. Some numbers can be represented exactly, such as:


• 1563 = 1 × 210 + 1 × 29 + 0 × 28 + 0 × 27 + 0 × 26 + 0 × 25 + 1 × 24 + 1 × 23 + 0 × 22 + 1 × 21 + 1 × 20 =
(11000011011)2
• 2.125 = 1 × 22 + 0 × 2−1 + 0 × 2−2 + 1 × 2−3
2. Other numbers have no exact binary representation (will lead to chopping and or round-off errors),
e.g.
 
1
• = (0.00011001100110011001100 . . .)2
10 10

1
 
1
• =
100 10
 
1
• =
1000 10
• 3.1 = 1 × 21 + 1 × 20 + 1 × 2−4 + 1 × 2−5 + . . ..
• Transcendental numbers, e.g. π have no finite representation, in both decimal or binary systems

Question: How do we convert from decimal to binary numbers? Please see the lecture notes Math243Lnotes2020a
for details. However here is a very brief tour.

In decimal representation of integers we identify an integer N by

dn dn−1 · · · d1 d0 (1)

with
N = dn × 10n + dn−1 × 10n−1 + · · · + d1 × 101 + d0 × 100 (2)
Similarly, an infinite decimal of the form

dn dn−1 · · · d1 d0 .d−1 d−2 d−3 . . . (3)

can be represented as

dn × 10n + dn−1 × 10n−1 + · · · + d1 × 101 + d0 × 100 + d−1 × 10−1 + d−2 × 10−2 + . . . (4)

where dk is an integer from 0 to 9, with dn 6= 0.

Example 1 Base 2 representation of numbers:

23456 = 2 × 104 + 3 × 103 + 4 × 102 + 5 × 101 + 6 × 100


234.56 = 2 × 102 + 3 × 101 + 4 × 100 + 5 × 10−1 + 6 × 10−2

In general, an (n + 1) digit integer I can be expressed in base b as

I = (an an−1 an−2 . . . . . . a2 a1 a0 )b (5)

If b = 10 the digits aj can take values between 0 and 9 (an 6= 0)


If b = 2 the digits aj can either be 0 or 1.
The decimal value of the number (with base b) given in equation (5) is

I = bn × an + bn−1 × an−1 + · · · + b1 × a1 + b0 × a0 . (6)

In a binary system, the mathematical expression of an integer is

I = (an an−1 an−2 . . . . . . a2 a1 a0 )2 (7)


where ai is a binary bit, that is 0 or 1. The decimal value of a binary number is
n
X
I = an × 2n + an−1 × 2n−1 + · · · + a1 × 21 + a0 × 20 = 2k ak . (8)
k=0

2
If the binary number contains a fractional part, i.e if it is of the form

I = (an an−1 an−2 . . . . . . a2 a1 a0 .a−1 a−2 a−3 . . .)2 (9)

it can be represented as

I = 2n an + 2n−1 an−1 + · · · + 21 a1 + 20 a0 + 2−1 a−1 + 2−2 a−2 + 2−3 a−3 + . . . (10)

Example 2 Base 2 to decimal:

(11001)2 = 1 × 24 + 1 × 23 + 0 × 22 + 0 × 21 + 1 × 20 = 16 + 8 + 0 + 0 + 1 = (25)10
(0.0111)2 = 0 × 2−1 + 1 × 2−2 + 1 × 2−3 + 1 × 2−4 = 0 + 0.25 + 0.125 + 0.0625 = (0.4375)10

1.1 Converting the integer part


A decimal number I10 can be represented in binary form as

I10 = an × 2n + an−1 × 2n−1 + . . . + a1 × 21 + a0 × 20 = (an an−1 an−2 . . . . . . a2 a1 a0 )2 . (11)

The following procedure is a more general approach that can be used to convert decimal integers to binary
form.
Given an integer I10 , determine the quotients Qi and the remainders Ri such that

I10 = 2Q0 + R0
Q0 = 2Q1 + R1
Q1 = 2Q2 + R2
.. ..
. .
Qn−1 = 2Qn + Rn

The remainders are 0s and 1s and the binary form for the number I10 is

Rn Rn−1 Rn−2 . . . R1 R0 (12)

It should be noted that


Rn Rn−1 Rn−2 . . . R1 R0 = an an−1 an−2 . . . a1 a0 (13)

Example 3 Convert the following decimal numbers into binary form

(a) (53)10
(b) (315)10

Answer:

(a)
Quotients Remainders
2 53
2 26 1 = a0
2 13 0 = a1
2 6 1 = a2
2 3 0 = a3
2 1 1 = a4
2 0 1 = a5
Thus (53)10 = (110101)2

3
(b)
Quotients Remainders
2 315
2 157 1 = a0
2 78 1 = a1
2 39 0 = a2
2 19 1 = a3
2 9 1 = a4
2 4 1 = a5
2 2 0 = a6
2 1 0 = a7
2 0 1 = a8
Thus (315)10 = (100111011)2

For more details, including on binary shifting please check the notes Math243Lnotes2020a.

2 Computers use two formats for numbers


Definition 1 (Fixed-point numbers) Fixed-point numbers are used to store integers.

Here each number is stored in a computer word of 32 binary digits.

• The answer is always an integer


• The stored numbers are equally spaced

• Fixed-point numbers give a limited range of numbers

Definition 2 (Floating point numbers) Floating point numbers approximate real numbers.

• stored numbers are not equally spaced

• gives a wide range of numbers that can be represented exactly.

3 Algorithms and Convergence


Algorithm, is a word that crops up so often in numerical analysis it important to have an understanding of
what it means.

Definition 3 (Algorithm) An algorithm is a procedure that describes, in an unambiguous manner, a finite


sequence of steps to be performed in a specific order to achieve a certain outcome.
Alternatively:
A numerical algorithm is simply a procedure that:

1. Takes input variables,


2. Through a sequence of steps manipulates the input variables along with additional temporary variables
to give,

3. Output variables.

4
In numerical methods the objective of an algorithm is to implement a procedure to solve a problem or
approximate a solution to a problem.

Definition 4 (Stability of an algorithm) An algorithm is stable if any small change in the initial data
only results in a small change in the final data. Otherwise it is unstable. The algorithm is conditionally
stable if it is only stable for a certain choice of initial data.

Key concepts - error growth


Let f (x) = 0 be an equation to be solved using a numerical algorithm.
Suppose x is the exact solution and x0 , the initial approximation, x1 the second approximation, · · · xn the
nth approximation. Let:

• E0 = |x − x0 | > 0 denote the initial error,


• E1 = |x − x1 | > 0 the error after the first approximation,
• E2 = |x − x2 | > 0 the error after the second approximation,
.
• ..

• En = |x − xn | is the error after n operations.

Definition 5 (Linear growth) If En ≈ CE0 · n (for a constant C, which is independent of n), then the
error growth is linear.

Linear error growth is usually unavoidable, and in the case where C and E0 are small the results are generally
acceptable. The algorithm is stable.

Definition 6 (Exponential error growth) If En ≈ C n E0 for some C > 1, then the error growth is
exponential.

In this case the error will dominate very fast(the result is unreliable).
NB. Exponential error growth is unacceptable. Regardless of the size of E0 the error grows rapidly. The
algorithm is unstable.

Example 4 Decide if the recursive equation


10
pn = pn−1 − pn−3 for n = 2, 3, . . . ,
3
gives linear or exponential error growth if 5-digit rounding is used.

Solution: Please follow the recorded lecture.

4 Rate of Convergence: Sequences


The rate of convergence of a sequence can be linear (slow), quadratic (fast), and so on. For a numerical
method, quadratic convergence is more desirable than linear convergence, higher order convergence allows
us to obtain a solution with fewer steps (iterations).

5

Definition 7 (Rate of Convergence) . Suppose we have two sequences: {βn }n=1 which converges to

zero, and {αn }n=1 which converges to a number α.
If there exists K > 0 with
|αn − α| ≤ K|βn |,

for n large enough, then we say that {αn }n=1 converges to α with a rate of Convergence O(βn ) (read “Big
Oh of βn ”).
We write
αn = α + O(βn ).

Note: The sequence {βn }n=1 is usually chosen to be

1
βn = ,
np
for some positive value of p.
Note: We are generally interested in the largest value of p with
 
1
αn = α + O .
np

Big-Oh defined

• Big-Oh is about concerned with finding an asymptotic upper bound


• Formal definition: f (N ) = O(g(N )) if there exists constants c and N0 such that

f (N ) ≤ c · g(N ) for all N ≥ N0 .

(read: “f (N ) grows no faster than g(N )”)


• Note: We are interested in how f grows when N is very large.

Example 5 Find the rate of convergence of the sequence


1
αn = α + √ as n → ∞.
n

Solution: see also the recorded lecture.


Strategy: can we find a K > 0 such that |αn − α| ≤ K|βn |?
Take  > 0 and note that
1 1
|αn − α| = √ ≤ (1 + ) √ .
n | {z } n
K |{z}
βn
 
1
It follows that αn = α + O √ .
n
 
∞ 1
Conclusion: The rate of convergence of the sequence {αn }n=1 to α is O √ .
n

6
Example 6 Determine the rate of convergence of the sequence
 
1 1
αn = sin − .
n n

Solution:
Start with the Maclaurin series of sin(x) (this is the Taylor series expansion of sin(x) at x = 0):

x3 x5
sin(x) = x − + − ··· .
3! 5!
 
1 1 1 1
Hence sin = − 3+ − ··· .
n n 6n 120n5
Then      
1 1 1 1 1 1
αn = sin − ≈− 3 +O ⇒ |αn − 0| = +O .
n n 6n n5 6n3 n5
 
1 1 1
Hence αn = 0 + O noting that  3 as n → ∞.
n3 n 5 n

5 Rate of Convergence: Function Limits


Definition 8 (Rate of Convergence) Consider two functions G(x) and F (x) such that

lim G(h) = 0 and lim F (h) = L.


h→0 h→0

If there exists K > 0 such that

|F (h) − L| ≤ K|G(h)| for all h < H,

where H > 0, then


F (h) = L + O (G(h)) .
We say F (h) converges to L with a rate of convergence O (G(h)).

Example 7 Consider the function


α(h) = sin(h) − h.
What is the rate of convergence of α(h) as h → 0?

Solution: This example is similar to the previous one, and so we use the Maclaurin series:

h3
+ O h5 .

sin(h) ≈ h −
6
Then
h3
+ O h5 ⇒ lim α(h) = 0 + O h3 .
 
|α(h)| =
6 h→0

Conclusion: The rate of convergence is O h3




——- THE END ——-

You might also like