Computer Arithmetic
Representations
   Dr. S. Poornima, ASP/IT
• So far focused on:
   – Performance (seconds, cycles, instructions)
   – Instruction Set Architecture
   – Assembly Language and Machine Language
   – Instruction Execution and Datapath
• Further focusing on:
   – Implementing the Architecture
       • Representation of data
                                                   2
                        Arithmetic
• Operations on integers
    • Addition and Subtraction
    • ALU: arithmetic and logic unit
    • Multiply
    • Divide
• Floating Point real numbers
     • notation
     • add
     • multiply
                                       3
                              Number Representations
• Decimal number system:
       4382 = 4x103 + 3x102 + 8x101 + 2x100
• Binary numbers (base 2)                                                    a0 : least significant bit (lsb)
                                                                             an-1: most significant bit (msb)
        (an-1 an-2... a1 a0) two = an-1 x 2n-1 + an-2 x 2n-2 + … + a0 x 20
                                                                             n : Position
                                                                             a : Digit
• With n bits 2n possible combinations                                       2 : Base (radix/weight)
     1 bit     2 bits       3 bits       4 bits         decimal
 0           00           000         0000          0
 1           01           001         0001          1
             10           010         0010          2
             11           011         0011          3                    How do we represent
                          100         0100          4                    negative numbers????
                          101         0101          5
                          110         0110          6
                          111         0111          7
                                      1000          8
                                      1001          9
                                                                                                            4
                Unsigned Integer
• Consider a n-bit vector of the form:
                A = an−1an−2an−3 La0
   where ai=0 or 1 for i in [0, n-1].
• This vector can represent positive integer values V = A in the
  range 0 to 2n-1, where
         A = 2 n −1 an −1 + 2 n − 2 an −2 + L + 21 a1 + 20 a0
                  Signed Integer
• 3 major representations:
    Sign and magnitude
    One’s complement
    Two’s complement
• Assumptions:
    4-bit machine word
    16 different values can be represented
    Roughly half are positive, half are negative
                       Signed binary numbers
Possible representations:
    Sign Magnitude      One's Complement      Two's Complement
        000 = +0                          000 = +0               000 = +0
        001 = +1                          001 = +1               001 = +1
        010 = +2                          010 = +2               010 = +2
        011 = +3                          011 = +3               011 = +3
        100 = -0                          100 = -3               100 = -4
        101 = -1                          101 = -2               101 = -3
        110 = -2                          110 = -1               110 = -2
        111 = -3                          111 = -0               111 = -1
     One's Complement : Invert all bits (1 0, 0 1)
     Two's Complement : Invert all bits and add 1
     Complement -> to represent negative numbers
                                                                            7
          Sign and Magnitude Representation
                                   -7       +0
                          -6       1111   0000       +1
                               1110           0001
                     -5                                      +2       +
                          1101                     0010
                -4     1100                          0011     +3   0 100 = + 4
                -3     1011                          0100     +4   1 100 = - 4
                          1010                     0101
                  -2                                         +5       -
                            1001                 0110
                          -1       1000   0111          +6
                                   -0       +7
High order bit is sign: 0 = positive (or zero), 1 = negative
Three low order bits is the magnitude: 0 (000) thru 7 (111)
Number range for n bits = +/-2n-1 -1
Two representations for 0
         One’s Complement Representation
                                   -0       +0
                          -1       1111   0000     +1
                               1110           0001
                     -2                                      +2       +
                          1101                     0010
                -3    1100                          0011      +3   0 100 = + 4
                -4    1011                          0100      +4   1 011 = - 4
                          1010                     0101
                 -5                                          +5       -
                            1001                 0110
                          -6       1000   0111          +6
                                   -7       +7
• Subtraction implemented by addition & 1's complement
• Still two representations of 0! This causes some problems
• Some complexities in addition
          Two’s Complement Representation
                                            -1       +0
like 1's comp except shifted
one position clockwise             -2       1111   0000     +1
                                        1110           0001
                              -3                                      +2       +
                                   1101                     0010
                         -4     1100                         0011      +3   0 100 = + 4
                         -5     1011                         0100      +4   1 100 = - 4
                                   1010                     0101
                           -6                                         +5       -
                                     1001                 0110
                                   -7       1000   0111          +6
                                            -8       +7
  • Only one representation for 0
  • One more negative number than positive number
Binary, Signed-Integer Representations
     B                       V alues represented
                  Sign and
b3 b2b1b0        magnitude      1's complement     2's complement
 0   1   1   1      + 7               +7                +   7
 0   1   1   0      + 6               +6                +   6
 0   1   0   1      + 5               +5                +   5
 0   1   0   0      + 4               +4                +   4
 0   0   1   1      + 3               +3                +   3
 0   0   1   0      + 2               +2                +   2
 0   0   0   1      + 1               + 1               +   1
 0   0   0   0      + 0               +0                +   0
 1   0   0   0      -0                -7                -   8
 1   0   0   1      -1                -6                -   7
 1   0   1   0      -2                -5                -   6
 1   0   1   1      -3                -4                -   5
 1   1   0   0      -4                -3                -   4
 1   1   0   1      -5                -2                -   3
 1   1   1   0      -6                -1                -   2
 1   1   1   1      -7                -0                -   1
                  Comparison
• Sign and Magnitude
  – Cumbersome addition/subtraction
  – Must compare magnitudes to determine sign of result
• One’s Complement
  – Simply bit-wise complement
• Two’s Complement
  – Simply bit-wise complement + 1
                                  Floating Point
•   We need a way to represent
     – numbers with fractions, e.g., 3.1416
     – very small numbers, e.g., .000000001
     – very large numbers, e.g., 3.15576 × 109
•   Representation:
     – sign, exponent, significand:
                      (–1)sign × significand × 2exponent
     – more bits for significand gives more accuracy
     – more bits for exponent increases range
•   IEEE 754 floating point standard:
     – Single precision : 8 bit exponent, 23 bit significand
     – Double precision: 11 bit exponent, 52 bit significand
                                                               13
    IEEE 754 floating point standard
         Has greatly improved portability of scientific applications
   Single precision (sp) format (“float” in C)
             S   E                   M
         1 bit   8 bits           23 bits
   Double precision (dp) format (“double” in C)
         S           E                                   M
     1 bit        11 bits                              52 bits
FP numbers are in signed magnitude representation of the form
(-1)S x M x BE
where
      S is the sign bit (0=positive, 1=negative)
      M is the mantissa (also called the significand)
      B is the base (implied)
      E is the exponent
     Example:    22.34 x 10- 4
         S=0
         M=22.34
         B=10
         E=- 4
Normalizing Numbers
In Scientific Notation, we generally choose one digit to the left of the decimal
point
                  13.25 × 1010 becomes 1.325 × 1011
Normalizing means
• Shifting the decimal point until we have the right number of digits to its left
  (normally one) [i.e.,Only a single non-zero digit before the radix point]
• Adding or subtracting from the exponent to reflect the shift
                     Normalizing Procedure
Decimal number 123.4567 can be normalized as 1.234567×102
Binary number 1010.1011 can be normalized as 1.0101011×23
            0.0101010 can be normalized as 1.01010×2-2
                IEEE 754 Floating-Point Bias component
• Leading “1” bit of significand is implicit
• Exponent could be positive or negative, hence Add a bias to exponent to make it strictly
  positive.
• Exponent is “biased” to make sorting easier
                    (–1)sign × (1+significand)
                               (1+             × 2exponent – bias
     – Bias of 127 for single precision and 1023 for double precision
Example:
     – binary : - 0.11 = -1.1 x 2-1
     – floating point: exponent = -1+bias = -1+127=126 = 011111102
     – IEEE single precision:
          31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
          1 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0000000000
                                                                                                  18
Unsigned Integer (E′)
E′=E+Bias
E′=E+127
E′=E+1023
IEEE 754 Floating Point standard Formats
            IEEE 754 floating point Normalization
It requires an exponent less than -126 (underflow) or greater than +127 (overflow).
Both are exceptions that need to be considered.
                         Special Values
• The end value 0 and 255 are used to represent special values.
• When E’=0 and M=0, the value exact 0 is represented. (±0)
• When E’=255 and M=0, the value ∞ is represented. (± ∞)
• When E’=0 and M≠0, denormal numbers are represented. The value is
   ±0.M×2-126.
• When E’=255 and M≠0, Not a Number (NaN).
Note:
The sign is stored in bit 32. The exponent can be computed from bits 24-31 by subtracting 127. The
mantissa (also known as significand or fraction) is stored in bits 1-23. An invisible leading bit (i.e. it is not
actually stored) with value 1.0 is placed in front, then bit 23 has a value of 1/2, bit 22 has value 1/4 etc. As
a result, the mantissa has a value between 1.0 and 2. If the exponent reaches -127 (binary 00000000), the
leading 1 is no longer used to enable gradual underflow.
Underflow:
If the exponent has minimum value (all zero), special rules for denormalized values are followed. The
exponent value is set to 2-126 and the "invisible" leading bit for the mantissa is no longer used. The range of
the mantissa is now [0:1).