C language slides for c programming book by ANSI

Review of Numbers
• Computers are made to deal with numbers
• What can we represent in N bits?
• Unsigned integers:
0 to 2N - 1
• Signed Integers (Two’s Complement)
-2(N-1) to 2(N-1) - 1
Signed Integers
-2(N-1) - 1 to 2(N-1) - 1

Other Numbers
• What about other numbers?
• Very large numbers? (seconds/century)
3,155,760,00010 (3.1557610 x 109)
• Very small numbers? (atomic diameter)
0.0000000110 (1.010 x 10-8)
• Rationals (repeating pattern)
• 2/3 (0.666666666. . .)
• Irrationals
21/2 (1.414213562373. . .)
• Transcendentals
• e (2.718...),  (3.141...)
• All represented in scientific notation

2i
2i-1
4
2
1
1/2
1/4
1/8
2-j
bi bi-1 ••• b2 b1 b0 b-1 b-2 b-3 ••• b-j
•
•
•
Fractional Binary Numbers
• Representation
• Bits to right of “binary point” represent fractional powers of 2
• Represents rational number:
• • •

Fractional Binary Numbers: Examples
 Value Representation
5 3/4 = 23/4 101.112 = 4 + 1 + 1/2 + 1/4
2 7/8 = 23/8 010.1112 = 2 + 1/2 + 1/4 + 1/8
1 7/16 = 23/16 001.01112 = 1 + 1/4 + 1/8 + 1/16
 Observations
 Divide by 2 by shifting right (unsigned)
 Multiply by 2 by shifting left
 Numbers of form 0.111111…2 are just below 1.0
 1/2 + 1/4 + 1/8 + … + 1/2i + … ➙ 1.0
 Use notation 1.0 – ε

Representable Numbers
• Limitation #1
• Can only exactly represent numbers of the form x/2k
• Other rational numbers have repeating bit representations
• Value Representation
• 1/3 0.0101010101[01]…2
• 1/5 0.001100110011[0011]…2
• 1/10 0.0001100110011[0011]…2
• Limitation #2
• Just one setting of binary point within the w bits
• Limited range of numbers (very small values? very large?)

Objective
• To understand the fundamentals of floating-
point representation
• To know the IEEE-754 Floating Point
Standard

Patriot Missile
• Gulf War I
• Failed to intercept
incoming Iraqi scud
missile (Feb 25, 1991)
• 28 American soldiers
killed
GAO Report: GAO/IMTEC-92-26 Patriot Missile Software
Problem
http://www.fas.org/spp/starwars/gao/im92026.htm

Patriot Design
• Intended to operate only for a few hours
• Defend Europe from Soviet aircraft and missile
• Four 24-bit registers (1970s design!)
• Kept time with integer counter: incremented every
1/10 second
• Calculate speed of incoming missile to predict
future positions:
velocity = loc1 – loc0/(count1 – count0) * 0.1
• But, cannot represent 0.1 exactly!

Floating Imprecision
• 24-bits:
0.1 = 1/24 + 1/25 + 1/28 + 1/29
+ 1/212 + 1/213 + 1/216 + 1/217
+ 1/220 + 1/221
= 209715 / 2097152
Error is 0.2/2097152 = 1/10485760
One hour = 3600 seconds
3600 * 1/10485760 * 10 = 0.0034s
20 hours = 0.0687s
Miss target! (137 meters)

Two weeks before the incident, Army officials received Israeli data
indicating some loss in accuracy after the system had been running
for 8 consecutive hours. Consequently, Army officials modified the
software to improve the system's accuracy. However, the modified
software did not reach Dhahran until February 26, 1991--the day
after the Scud incident.
GAO Report
http://fas.org/spp/starwars/gao/im92026.htm

• Numerical Form:
(–1)s M 2E
• Sign bit s determines whether number is negative or positive
• Significand M normally a fractional value in range [1.0,2.0).
• Exponent E weights value by power of two
• Encoding
• MSB s is sign bit s
• exp field encodes E (but is not equal to E)
• frac field encodes M (but is not equal to M)
Floating Point Representation
s exp frac
Example:
1521310 = (-1)0 x 1.11011011011012 x 213

Exponential Notation
The representations differ
in that the decimal place –
the “point” -- “floats” to
the left or right (with the
appropriate adjustment in
the exponent).
• The following are equivalent
representations of 1,234
123,400.0 x 10-2
12,340.0 x 10-1
1,234.0 x 100
123.4 x 101
12.34 x 102
1.234 x 103
0.1234 x 104

Parts of a Floating Point Number
-0.9876 x 10-3
Sign of
mantissa
Location of
decimal point Mantissa
Exponent
Sign of
exponent
Base

IEEE 754 Standard
• Most common standard for representing floating
point numbers
• Single precision: 32 bits, consisting of...
• Sign bit (1 bit)
• Exponent (8 bits)
• Mantissa (23 bits)
• Double precision: 64 bits, consisting of…
• Sign bit (1 bit)
• Exponent (11 bits)
• Mantissa (52 bits)
Prof. Willian Kahan

Single Precision Format
32 bits
Mantissa (23 bits)
Exponent (8 bits)
Sign of mantissa (1 bit)

Normalization
• The mantissa is normalized
• Has an implied decimal place on left
• Has an implied “1” on left of the decimal place
• E.g.,
• Mantissa 
• Represents…
10100000000000000000000
1.1012 = 1.62510
• Normalized form: no leadings 0s
(exactly one digit to left of decimal point)
• Normalized: 1.0 x 10-9
• Not normalized: 0.1 x 10-8,10.0 x 10-10

Excess Notation
• To include +ve and –ve exponents, “excess”
notation is used
• Single precision: excess 127
• Double precision: excess 1023
• The value of the exponent stored is larger than the
actual exponent
• E.g., excess 127,
• Exponent 
• Represents…
10000111
135 – 127 = 8

Example
• Single precision
0 10000010 11000000000000000000000
1.112
130 – 127 = 3
0 = positive mantissa
+1.112 x 23 = 1110.02 = 14.010

Hexadecimal
• It is convenient and common to represent
the original floating point number in
hexadecimal
• The preceding example…
0 10000010 11000000000000000000000
4 1 6 0 0 0 0 0

Converting from Floating Point
• E.g., What decimal value is represented by
the following 32-bit floating point number?
C17B000016

• Step 1
• Express in binary and find S, E, and M
C17B000016 =
1 10000010 111101100000000000000002
S E M
1 = negative
0 = positive

• Step 2
• Find “real” exponent, n
• n = E – 127
= 100000102 – 127
= 130 – 127
= 3

• Step 3
• Put S, M, and n together to form binary result
• (Don’t forget the implied “1.” on the left of the
mantissa.)
-1.11110112 x 2n =
-1.11110112 x 23 =
-1111.10112

• Step 4
• Express result in decimal
-1111.10112
-15
2-1 = 0.5
2-3 = 0.125
2-4 = 0.0625
0.6875
Answer: -15.6875

Converting from Floating Point
• E.g., What decimal value is represented by
the following 32-bit floating point number?
42808000 16

Converting to Floating Point
• E.g., Express 36.562510 as a 32-bit floating
point number (in hexadecimal)

• Step 1
• Express original value in binary
36.562510 =
100100.10012

• Step 2
• Normalize
100100.10012 =
1.0010010012 x 25

• Step 3
• Determine S, E, and M
+1.0010010012 x 25
S = 0 (because the value is positive)
M
S
n E = n + 127
= 5 + 127
= 132
= 100001002

• Step 4
• Put S, E, and M together to form 32-bit binary
result
0 10000100 001001001000000000000002
S E M

• Step 5
• Express in hexadecimal
0 10000100 001001001000000000000002 =
0100 0010 0001 0010 0100 0000 0000 00002 =
4 2 1 2 4 0 0 016
Answer: 4212400016

• E.g., Express 6.510 as a 32-bit floating point
number (in hexadecimal)

• E.g., Express 0.1 as a 32-bit floating point
number (in hexadecimal)

Zero, Infinity, and NaN
• Zero
– Exponent field E = 0 and fraction F = 0
– +0 and –0 are possible according to sign bit S
• Infinity
– Infinity is a special value represented with maximum E and F = 0
• For single precision with 8-bit exponent: maximum E = 255
• For double precision with 11-bit exponent: maximum E = 2047
– Infinity can result from overflow or division by zero
– +∞ and –∞ are possible according to sign bit S
• NaN (Not a Number)
– NaN is a special value represented with maximum E and F ≠ 0
– Result from exceptional situations, such as 0/0 or sqrt(negative)
– Operation on a NaN results is NaN: Op(X, NaN) = NaN

Simple 6-bit Floating Point Example
• 6-bit floating point representation
– Sign bit is the most significant bit
– Next 3 bits are the exponent with a bias of 3
– Last 2 bits are the fraction
• Same general form as IEEE
– Normalized, denormalized
– Representation of 0, infinity and NaN
• Value of normalized numbers (–1)S × (1.F)2 × 2E – 3
• Value of denormalized numbers (–1)S × (0.F)2 × 2– 2
S Exponent3 Fraction2

Values Related to Exponent
Exp. exp E 2E
0 000 2
- ¼
1 001 2
- ¼
2 010 1
- ½
3 011 0 1
4 100 1 2
5 101 2 4
6 110 3 8
7 111 n/a
Denormalized
Inf or NaN
Normalized

Dynamic Range of Values
s exp frac E value
0 000 00 2
- 0
0 000 01 2
- 1/4*1/4=1/16
0 000 10 2
- 2/4*1/4=2/16
0 000 11 2
- 3/4*1/4=3/16
0 001 00 2
- 4/4*1/4=4/16=1/4=0.25
0 001 01 2
- 5/4*1/4=5/16
0 001 10 2
- 6/4*1/4=6/16
0 001 11 2
- 7/4*1/4=7/16
0 010 00 1
- 4/4*2/4=8/16=1/2=0.5
0 010 01 1
- 5/4*2/4=10/16
0 010 10 1
- 6/4*2/4=12/16=0.75
0 010 11 1
- 7/4*2/4=14/16
smallest denormalized
largest denormalized
smallest normalized

s exp frac E value
0 011 00 0 4/4*4/4=16/16=1
0 011 01 0 5/4*4/4=20/16=1.25
0 011 10 0 6/4*4/4=24/16=1.5
0 011 11 0 7/4*4/4=28/16=1.75
0 100 00 1 4/4*8/4=32/16=2
0 100 01 1 5/4*8/4=40/16=2.5
0 100 10 1 6/4*8/4=48/16=3
0 100 11 1 7/4*8/4=56/16=3.5
0 101 00 2 4/4*16/4=64/16=4
0 101 01 2 5/4*16/4=80/16=5
0 101 10 2 6/4*16/4=96/16=6
0 101 11 2 7/4*16/4=112/16=7

s exp frac E value
0 110 00 3 4/4*32/4=128/16=8
0 110 01 3 5/4*32/4=160/16=10
0 110 10 3 6/4*32/4=192/16=12
0 110 11 3 7/4*32/4=224/16=14
0 111 00 
0 111 01 NaN
0 111 10 NaN
0 111 11 NaN
largest normalized

Floating Point Addition Example
• Consider adding: (1.111)2 × 2–1 + (1.011)2 × 2–3
– For simplicity, we assume 4 bits of precision (or 3 bits of
fraction)
• Cannot add significands … Why?
– Because exponents are not equal
• How to make exponents equal?
– Shift the significand of the lesser exponent right
until its exponent matches the larger number
• (1.011)2 × 2–3 = (0.1011)2 × 2–2 = (0.01011)2 × 2–1
– Difference between the two exponents = –1 – (–3) = 2
– So, shift right by 2 bits
• Now, add the significands: Carry
1.111
0.01011
10.00111
+

Addition Example
• So, (1.111)2 × 2–1 + (1.011)2 × 2–3 = (10.00111)2 × 2–1
• However, result (10.00111)2 × 2–1 is NOT normalized
• Normalize result: (10.00111)2 × 2–1 = (1.000111)2 × 20
– In this example, we have a carry
– So, shift right by 1 bit and increment the exponent
• Round the significand to fit in appropriate number of bits
– We assumed 4 bits of precision or 3 bits of fraction
• Round to nearest: (1.000111)2 ≈ (1.001)2
– Renormalize if rounding generates a carry
• Detect overflow / underflow
– If exponent becomes too large (overflow) or too small (underflow)
1.000 111
1
1.001
+

Summary: IEEE Floating Point
Single Precision (32 bits)
31 0
22
Sign
30 23
Exponent Fraction
8 bits
1 23 bits
Exponent values: 0 zeroes
1-254 exp + 127
255 infinities, NaN
Value = (1 – 2*Sign) (1 + Fraction)Exponent - 127

Denormalized Values
• Condition
• exp = 000…0
• Value
• Exponent value E = –Bias + 1
• Significand value M = 0.xxx…x2
•xxx…x: bits of frac
• Cases
• exp = 000…0, frac = 000…0
• Represents value 0
• Note that have distinct values +0 and –0
• exp = 000…0, frac  000…0
• Numbers very close to 0.0

Special Values
• Condition
• exp = 111…1
• Cases
• exp = 111…1, frac = 000…0
• Represents value(infinity)
• Operation that overflows
• Both positive and negative
• E.g., 1.0/0.0 = 1.0/0.0 = +, 1.0/0.0 = 
• exp = 111…1, frac  000…0
• Not-a-Number (NaN)
• Represents case when no numeric value can be
determined
• E.g., sqrt(–1), 

Interesting Numbers
• Description exp frac Numeric Value
• Zero 00…00 00…00 0.0
• Smallest Pos. Denorm. 00…00 00…01 2– {23,52} X 2– {126,1022}
• Single  1.4 X 10–45
• Double  4.9 X 10–324
• Largest Denormalized 00…00 11…11 (1.0 – ) X 2– {126,1022}
• Single  1.18 X 10–38
• Double  2.2 X 10–308
• Smallest Pos. Normalized 00…01 00…00 1.0 X 2– {126,1022}
• Just larger than largest denormalized
• One 01…11 00…00 1.0
• Largest Normalized 11…10 11…11 (2.0 – ) X 2{127,1023}
• Single  3.4 X 1038
• Double  1.8 X 10308

Visualization: Floating Point
Encodings
+
−
0
+Denorm +Normalized
−Denorm
−Normalized
+0
NaN NaN

Tiny Floating Point Example
• 8-bit Floating Point Representation
• the sign bit is in the most significant bit
• the next four bits are the exp, with a bias of 7
• the last three bits are the frac
• Same general form as IEEE Format
• normalized, denormalized
• representation of 0, NaN, infinity
s exp frac
1 4-bits 3-bits

s exp frac E Value
0 0000 000 -6 0
0 0000 001 -6 1/8*1/64 = 1/512
0 0000 010 -6 2/8*1/64 = 2/512
…
0 0000 110 -6 6/8*1/64 = 6/512
0 0000 111 -6 7/8*1/64 = 7/512
0 0001 000 -6 8/8*1/64 = 8/512
0 0001 001 -6 9/8*1/64 = 9/512
…
0 0110 110 -1 14/8*1/2 = 14/16
0 0110 111 -1 15/8*1/2 = 15/16
0 0111 000 0 8/8*1 = 1
0 0111 001 0 9/8*1 = 9/8
0 0111 010 0 10/8*1 = 10/8
…
0 1110 110 7 14/8*128 = 224
0 1110 111 7 15/8*128 = 240
0 1111 000 n/a inf
Dynamic Range (s=0 only)
closest to zero
largest denorm
smallest norm
closest to 1 below
closest to 1 above
largest norm
Denormalized
numbers
Normalized
numbers
v = (–1)s M 2E
norm: E = exp – Bias
denorm: E = 1 – Bias
(-1)0(0+1/4)*2-6
(-1)0(1+1/8)*2-6

-15 -10 -5 0 5 10 15
Denormalized Normalized Infinity
Distribution of Values
• 6-bit IEEE-like format
• e = 3 exponent bits
• f = 2 fraction bits
• Bias is 23-1-1 = 3
• Notice how the distribution gets denser toward zero.
8 values
s exp frac
1 3-bits 2-bits

Floats are not Reals
Need to understand details of underlying implementations
Int’s:
eg. 40000 * 40000 --> 1600000000
600000* 600000 --> ?
Floats:
Eg 2 : Is (x + y) + z = x + (y + z)?
eg
(1e20 + -1e20) + 3.14 --> 3.14
1e20 + (-1e20 + 3.14) --> ??
231−1=2,147,483,647

IEEE 754
Component Bits
Sign bit 1
Exponent 5
Fraction 10
Total 16 bits (2 bytes)
IEEE 754 Binary16 (F16) Format
Field Bits Description
Sign 1 0 = positive, 1 = negative
Exponent 8 Encodes exponent with bias
Fraction (Mantissa) 23
Precision bits (fractional
part)
Overview of IEEE 754 Binary 32

IEEE 754 Binary16 (F128) Format
Exponent 11 Encodes exponent with bias
Precision bits (fractional
part)
IEEE 754 Binary64
Exponent 15
Encodes exponent using a
bias of 16383
Fractional part of the
significand
IEEE 754

C language slides for c programming book by ANSI

More Related Content

Similar to C language slides for c programming book by ANSI

Recently uploaded

C language slides for c programming book by ANSI