Review of Numbers
•Computers are made to deal with numbers
• What can we represent in N bits?
• Unsigned integers:
0 to 2N - 1
• Signed Integers (Two’s Complement)
-2(N-1) to 2(N-1) - 1
Signed Integers
-2(N-1) - 1 to 2(N-1) - 1
3.
Other Numbers
• Whatabout other numbers?
• Very large numbers? (seconds/century)
3,155,760,00010 (3.1557610 x 109)
• Very small numbers? (atomic diameter)
0.0000000110 (1.010 x 10-8)
• Rationals (repeating pattern)
• 2/3 (0.666666666. . .)
• Irrationals
21/2 (1.414213562373. . .)
• Transcendentals
• e (2.718...), (3.141...)
• All represented in scientific notation
4.
2i
2i-1
4
2
1
1/2
1/4
1/8
2-j
bi bi-1 •••b2 b1 b0 b-1 b-2 b-3 ••• b-j
•
•
•
Fractional Binary Numbers
• Representation
• Bits to right of “binary point” represent fractional powers of 2
• Represents rational number:
• • •
5.
Fractional Binary Numbers:Examples
Value Representation
5 3/4 = 23/4 101.112 = 4 + 1 + 1/2 + 1/4
2 7/8 = 23/8 010.1112 = 2 + 1/2 + 1/4 + 1/8
1 7/16 = 23/16 001.01112 = 1 + 1/4 + 1/8 + 1/16
Observations
Divide by 2 by shifting right (unsigned)
Multiply by 2 by shifting left
Numbers of form 0.111111…2 are just below 1.0
1/2 + 1/4 + 1/8 + … + 1/2i + … ➙ 1.0
Use notation 1.0 – ε
6.
Representable Numbers
• Limitation#1
• Can only exactly represent numbers of the form x/2k
• Other rational numbers have repeating bit representations
• Value Representation
• 1/3 0.0101010101[01]…2
• 1/5 0.001100110011[0011]…2
• 1/10 0.0001100110011[0011]…2
• Limitation #2
• Just one setting of binary point within the w bits
• Limited range of numbers (very small values? very large?)
7.
Objective
• To understandthe fundamentals of floating-
point representation
• To know the IEEE-754 Floating Point
Standard
8.
Patriot Missile
• GulfWar I
• Failed to intercept
incoming Iraqi scud
missile (Feb 25, 1991)
• 28 American soldiers
killed
GAO Report: GAO/IMTEC-92-26 Patriot Missile Software
Problem
http://www.fas.org/spp/starwars/gao/im92026.htm
9.
Patriot Design
• Intendedto operate only for a few hours
• Defend Europe from Soviet aircraft and missile
• Four 24-bit registers (1970s design!)
• Kept time with integer counter: incremented every
1/10 second
• Calculate speed of incoming missile to predict
future positions:
velocity = loc1 – loc0/(count1 – count0) * 0.1
• But, cannot represent 0.1 exactly!
Two weeks beforethe incident, Army officials received Israeli data
indicating some loss in accuracy after the system had been running
for 8 consecutive hours. Consequently, Army officials modified the
software to improve the system's accuracy. However, the modified
software did not reach Dhahran until February 26, 1991--the day
after the Scud incident.
GAO Report
http://fas.org/spp/starwars/gao/im92026.htm
12.
• Numerical Form:
(–1)sM 2E
• Sign bit s determines whether number is negative or positive
• Significand M normally a fractional value in range [1.0,2.0).
• Exponent E weights value by power of two
• Encoding
• MSB s is sign bit s
• exp field encodes E (but is not equal to E)
• frac field encodes M (but is not equal to M)
Floating Point Representation
s exp frac
Example:
1521310 = (-1)0 x 1.11011011011012 x 213
13.
Exponential Notation
The representationsdiffer
in that the decimal place –
the “point” -- “floats” to
the left or right (with the
appropriate adjustment in
the exponent).
• The following are equivalent
representations of 1,234
123,400.0 x 10-2
12,340.0 x 10-1
1,234.0 x 100
123.4 x 101
12.34 x 102
1.234 x 103
0.1234 x 104
14.
Parts of aFloating Point Number
-0.9876 x 10-3
Sign of
mantissa
Location of
decimal point Mantissa
Exponent
Sign of
exponent
Base
15.
IEEE 754 Standard
•Most common standard for representing floating
point numbers
• Single precision: 32 bits, consisting of...
• Sign bit (1 bit)
• Exponent (8 bits)
• Mantissa (23 bits)
• Double precision: 64 bits, consisting of…
• Sign bit (1 bit)
• Exponent (11 bits)
• Mantissa (52 bits)
Prof. Willian Kahan
Normalization
• The mantissais normalized
• Has an implied decimal place on left
• Has an implied “1” on left of the decimal place
• E.g.,
• Mantissa
• Represents…
10100000000000000000000
1.1012 = 1.62510
• Normalized form: no leadings 0s
(exactly one digit to left of decimal point)
• Normalized: 1.0 x 10-9
• Not normalized: 0.1 x 10-8,10.0 x 10-10
18.
Excess Notation
• Toinclude +ve and –ve exponents, “excess”
notation is used
• Single precision: excess 127
• Double precision: excess 1023
• The value of the exponent stored is larger than the
actual exponent
• E.g., excess 127,
• Exponent
• Represents…
10000111
135 – 127 = 8
19.
Example
• Single precision
010000010 11000000000000000000000
1.112
130 – 127 = 3
0 = positive mantissa
+1.112 x 23 = 1110.02 = 14.010
20.
Hexadecimal
• It isconvenient and common to represent
the original floating point number in
hexadecimal
• The preceding example…
0 10000010 11000000000000000000000
4 1 6 0 0 0 0 0
21.
Converting from FloatingPoint
• E.g., What decimal value is represented by
the following 32-bit floating point number?
C17B000016
22.
• Step 1
•Express in binary and find S, E, and M
C17B000016 =
1 10000010 111101100000000000000002
S E M
1 = negative
0 = positive
23.
• Step 2
•Find “real” exponent, n
• n = E – 127
= 100000102 – 127
= 130 – 127
= 3
24.
• Step 3
•Put S, M, and n together to form binary result
• (Don’t forget the implied “1.” on the left of the
mantissa.)
-1.11110112 x 2n =
-1.11110112 x 23 =
-1111.10112
Converting to FloatingPoint
• E.g., Express 6.510 as a 32-bit floating point
number (in hexadecimal)
34.
Converting to FloatingPoint
• E.g., Express 0.1 as a 32-bit floating point
number (in hexadecimal)
35.
Zero, Infinity, andNaN
• Zero
– Exponent field E = 0 and fraction F = 0
– +0 and –0 are possible according to sign bit S
• Infinity
– Infinity is a special value represented with maximum E and F = 0
• For single precision with 8-bit exponent: maximum E = 255
• For double precision with 11-bit exponent: maximum E = 2047
– Infinity can result from overflow or division by zero
– +∞ and –∞ are possible according to sign bit S
• NaN (Not a Number)
– NaN is a special value represented with maximum E and F ≠ 0
– Result from exceptional situations, such as 0/0 or sqrt(negative)
– Operation on a NaN results is NaN: Op(X, NaN) = NaN
36.
Simple 6-bit FloatingPoint Example
• 6-bit floating point representation
– Sign bit is the most significant bit
– Next 3 bits are the exponent with a bias of 3
– Last 2 bits are the fraction
• Same general form as IEEE
– Normalized, denormalized
– Representation of 0, infinity and NaN
• Value of normalized numbers (–1)S × (1.F)2 × 2E – 3
• Value of denormalized numbers (–1)S × (0.F)2 × 2– 2
S Exponent3 Fraction2
37.
Values Related toExponent
Exp. exp E 2E
0 000 2
- ¼
1 001 2
- ¼
2 010 1
- ½
3 011 0 1
4 100 1 2
5 101 2 4
6 110 3 8
7 111 n/a
Denormalized
Inf or NaN
Normalized
Dynamic Range ofValues
s exp frac E value
0 110 00 3 4/4*32/4=128/16=8
0 110 01 3 5/4*32/4=160/16=10
0 110 10 3 6/4*32/4=192/16=12
0 110 11 3 7/4*32/4=224/16=14
0 111 00
0 111 01 NaN
0 111 10 NaN
0 111 11 NaN
largest normalized
41.
Floating Point AdditionExample
• Consider adding: (1.111)2 × 2–1 + (1.011)2 × 2–3
– For simplicity, we assume 4 bits of precision (or 3 bits of
fraction)
• Cannot add significands … Why?
– Because exponents are not equal
• How to make exponents equal?
– Shift the significand of the lesser exponent right
until its exponent matches the larger number
• (1.011)2 × 2–3 = (0.1011)2 × 2–2 = (0.01011)2 × 2–1
– Difference between the two exponents = –1 – (–3) = 2
– So, shift right by 2 bits
• Now, add the significands: Carry
1.111
0.01011
10.00111
+
42.
Addition Example
• So,(1.111)2 × 2–1 + (1.011)2 × 2–3 = (10.00111)2 × 2–1
• However, result (10.00111)2 × 2–1 is NOT normalized
• Normalize result: (10.00111)2 × 2–1 = (1.000111)2 × 20
– In this example, we have a carry
– So, shift right by 1 bit and increment the exponent
• Round the significand to fit in appropriate number of bits
– We assumed 4 bits of precision or 3 bits of fraction
• Round to nearest: (1.000111)2 ≈ (1.001)2
– Renormalize if rounding generates a carry
• Detect overflow / underflow
– If exponent becomes too large (overflow) or too small (underflow)
1.000 111
1
1.001
+
Denormalized Values
• Condition
•exp = 000…0
• Value
• Exponent value E = –Bias + 1
• Significand value M = 0.xxx…x2
•xxx…x: bits of frac
• Cases
• exp = 000…0, frac = 000…0
• Represents value 0
• Note that have distinct values +0 and –0
• exp = 000…0, frac 000…0
• Numbers very close to 0.0
46.
Special Values
• Condition
•exp = 111…1
• Cases
• exp = 111…1, frac = 000…0
• Represents value(infinity)
• Operation that overflows
• Both positive and negative
• E.g., 1.0/0.0 = 1.0/0.0 = +, 1.0/0.0 =
• exp = 111…1, frac 000…0
• Not-a-Number (NaN)
• Represents case when no numeric value can be
determined
• E.g., sqrt(–1),
47.
Interesting Numbers
• Descriptionexp frac Numeric Value
• Zero 00…00 00…00 0.0
• Smallest Pos. Denorm. 00…00 00…01 2– {23,52} X 2– {126,1022}
• Single 1.4 X 10–45
• Double 4.9 X 10–324
• Largest Denormalized 00…00 11…11 (1.0 – ) X 2– {126,1022}
• Single 1.18 X 10–38
• Double 2.2 X 10–308
• Smallest Pos. Normalized 00…01 00…00 1.0 X 2– {126,1022}
• Just larger than largest denormalized
• One 01…11 00…00 1.0
• Largest Normalized 11…10 11…11 (2.0 – ) X 2{127,1023}
• Single 3.4 X 1038
• Double 1.8 X 10308
Tiny Floating PointExample
• 8-bit Floating Point Representation
• the sign bit is in the most significant bit
• the next four bits are the exp, with a bias of 7
• the last three bits are the frac
• Same general form as IEEE Format
• normalized, denormalized
• representation of 0, NaN, infinity
s exp frac
1 4-bits 3-bits