These slides are being provided with permission from the copyright for CS2208
use only. The slides must not be reproduced or provided to anyone outside of
the class.
All download copies of the slides and/or lecture recordings are for personal use
only. Students must destroy these copies within 30 days after receipt of final
course evaluations.
Tutorial 04: Rounding
and Normalization
Computer Science Department
CS2208: Introduction to Computer Organization and Architecture
Fall 2023-2024
Instructor: Mahmoud R. El-Sakka
Office: MC-419
Email: elsakka@csd.uwo.ca
Phone: 519-661-2111 x86996
Tutorial 04: Rounding and Normalization
Rounding
The rounding mechanisms include
o Truncation (i.e., dropping unwanted bits) by rounding towards zero;
a.k.a., rounding down
o Rounding towards positive or negative infinity: the nearest valid
floating-point number in the direction of positive infinity (for positive
values) or negative infinity (for negative values) is chosen to decide the
rounding; a.k.a., rounding up.
o Rounding to nearest: the closest valid floating-point number to the
actual value is used.
© Mahmoud R. El-Sakka 2 CS 2208: Introduction to Computer Organization and Architecture
Tutorial 04: Rounding and Normalization
Rounding
Example 1: Round to the nearest the following numbers
to 8 digits after the binary point.
0.110101011001000 ==> 0.11010101 0.110101001001000 ==> 0.11010100
+ 0.00000001 + 0.00000001
1001000 = 0.11010110 1001000 = 0.11010101
> If it is == case, >
If it is == case,
1000000 and this bit = 1, 1000000 and this bit = 0,
you round up. you round down.
0.110101011000000 ==> 0.11010101 0.110101001000000 ==> 0.11010100
+ 0.00000001 + 0.00000000
1000000 = 0.11010110 1000000 = 0.11010100
== Mid-way round to even significand ==
1000000 1000000
0.110101010xxxxxx ==> 0.11010101 0.110101000xxxxxx ==> 0.11010100
+ 0.00000000 + 0.00000000
0xxxxxx = 0.11010101 0xxxxxx = 0.11010100
< <
1000000 1000000
© Mahmoud R. El-Sakka 3 CS 2208: Introduction to Computer Organization and Architecture
Tutorial 04: Rounding and Normalization
Normalization
0 = 0000
Example 2: Convert the unsigned value AB.BA16 1 = 0001
to binary. Normalize your answer. 2 = 0010
3 = 0011
4 = 0100
AB.BA16 5 = 0101
10101011.101110102 6 = 0110
7 = 0111
8 = 1000
After normalization, 9 = 1001
A = 1010
1.0101011101110102 × 2+7 B = 1011
C = 1100
In base b, a normalized number will have the form D = 1101
± b0 . b1 b2 b3... × bn E = 1110
where b0 ≠ 0, and b0, b1, b2, b3 ... are integers between 0 and b -1
F = 1111
© Mahmoud R. El-Sakka 4 CS 2208: Introduction to Computer Organization and Architecture
Tutorial 04: Rounding and Normalization
Normalization and Rounding
0 = 0000
Example 3: Consider the unsigned normalized
1 = 0001
binary value 1.0101011101110102 × 2 2 = 0010 +7
limit it (using truncation / rounding down) to 6 bits (1 + 5 bits) in total 3 = 0011
limit it (using rounding up) to 6 bits (1 + 5 bits) in total 4 = 0100
limit it (using rounding to the nearest) to 6 bits (1 + 5 bits) in total 5 = 0101
limit it (using truncation / rounding down) to 9 bits (1 + 8 bits) in total
6 = 0110
limit it (using rounding up) to 9 bits (1 + 8 bits) in total 7 = 0111
limit it (using rounding to the nearest) to 9 bits (1 + 8 bits) in total 8 = 1000
limit it (using truncation / rounding down) to 14 bits (1 + 13 bits) in total
9 = 1001
limit it (using rounding up) to 14 bits (1 + 13 bits) in total A = 1010
limit it (using rounding to the nearest) to 14 bits (1 + 13 bits) in total B = 1011
C = 1100
Calculate the rounding error in each case. D = 1101
Note that: The binary value 1.0101011101110102 × 2+7 E = 1110
F = 1111
= 10101011.101110102 = AB.BA16
© Mahmoud R. El-Sakka 5 CS 2208: Introduction to Computer Organization and Architecture
Tutorial 04: Rounding and Normalization
Normalization and Rounding
Limiting the answer to 6 bits (1 + 5) in total, 0 = 0000
1.0101011101110102 × 2+7 1 = 0001
2 = 0010
1.010102 × 2+7 (using truncation / rounding down) 3 = 0011
101010002 A816 4 = 0100
Truncation error = AB.BA16 – A816 = 3.BA16
5 = 0101
1.010112 × 2+7 (using rounding up) 6 = 0110
101011002 AC16 7 = 0111
Rounding up error = AB.BA16 – AC16 = – 0.4616 8 = 1000
9 = 1001
As 11101110102 > 10000000002 A = 1010
1.010112 × 2+7 (using rounding to the nearest) B = 1011
101011002 AC16 C = 1100
Rounding to the nearest error = AB.BA16 – AC16 = – 0.4616 D = 1101
E = 1110
F = 1111
© Mahmoud R. El-Sakka 6 CS 2208: Introduction to Computer Organization and Architecture
Tutorial 04: Rounding and Normalization
Normalization and Rounding
Limiting the answer to 9 bits (1 + 8) in total, 0 = 0000
1.0101011101110102 × 2+7 1 = 0001
2 = 0010
1.010101112 × 2+7 (using truncation / rounding down) 3 = 0011
10101011.12 AB.816 4 = 0100
Truncation error = AB.BA16 – AB.816 = 0.3A16
5 = 0101
1.010110002 × 2+7 (using rounding up) 6 = 0110
10101100.02AC16 7 = 0111
Rounding up error = AB.BA16 – AC16 = – 0.4616 8 = 1000
9 = 1001
As 01110102 < 10000002 A = 1010
1.010101112 × 2+7 (using rounding to the nearest) B = 1011
10101011.12 AB.816 C = 1100
Rounding to the nearest error = AB.BA16 – AB.816 = 0.3A16 D = 1101
E = 1110
F = 1111
© Mahmoud R. El-Sakka 7 CS 2208: Introduction to Computer Organization and Architecture
Tutorial 04: Rounding and Normalization
Normalization and Rounding
Limiting the answer to 14 bits (1 + 13) in total, 0 = 0000
1.0101011101110102 × 2+7 1 = 0001
2 = 0010
1.01010111011102 × 2 (using truncation / rounding down)
+7
3 = 0011
10101011.1011102 AB.B816
Truncation error = AB.BA16 – AB.B816 = 0.0216 4 = 0100
5 = 0101
1.01010111011112 × 2+7 (using rounding up) 6 = 0110
10101011.1011112 AB.BC16 7 = 0111
Rounding up error = AB.BA16 – AB.BC16 = – 0.0216 8 = 1000
9 = 1001
As 102 == 102 A = 1010
1.01010111011102 × 2 (using rounding to the nearest) B = 1011
+7
10101011.1011102 AB.B816
Rounding to the nearest error = AB.BA16 – AB.B816 = 0.0216 C = 1100
D = 1101
Which rounding mechanism produces less error? E = 1110
F = 1111
© Mahmoud R. El-Sakka 8 CS 2208: Introduction to Computer Organization and Architecture