Introduction to Computer Organization
Computer Performance
                Overview
•   Performance evaluation
•   Limitations
•   Metrics
•   Processor performance equation
•   Performance evaluation reports
•   Amdahl’s law
Performance Evaluation
         Program
         Compiler
           ISA
     Microarchitecture
        Hardware
      Manufacturing
Performance Evaluation Concepts
• Computer performance evaluation is based on:
   – Throughput (bits per second)
   – Response time (a.k.a. execution time, elapsed time)
• Component- and system-level performance
• Processor performance evaluation
   – Based on execution time of a program:
              PerformanceX = 1/ Execution timeX
   Relative performance: n = PerformanceX / PerformanceY
              n > 1 => X is n times faster than Y
• Terminology
   – Improve performance    = increase performance
   – Improve execution time = decrease execution time
                    Example
• Impact on throughput and response time of:
  – Using a faster processor
     • Decrease in response time
     • Increase in throughput
  – Adding more processors to a system
     • Increase in throughput
     • Decrease in response time (if overhead is low)
        – In Massively Parallel Processors (MPP)
        – In Symmetric Multiprocessing Processors (SMP)
            » Only if additional processors reduce queue time
         Measuring Performance
Components of the execution time of a program:
  1. CPU execution time
     •   User CPU time
     •   System CPU time
  2. I/O time
  3. Time spent running other programs
Unix time command                   9.7u = 9.7 sec User CPU time
   time cc prog.c                   1.5s = 1.5 sec System CPU time
   9.7u 1.5s 20 56%                 20 = Total Elapsed Time
                                    56% = Percent CPU time
    CPU Performance Equation
• CPU time = CPU clock cycles * cycle time
          = CPU clock cycles / clock rate
   – CPU clock cycles = IC * CPI
      • IC: instruction count (number of instructions per program)
      • CPI: average cycles per instruction
CPU time = IC * CPI * cycle time
       seconds instructions clock cycles   seconds
              =            *             *
       program program       instruction clock cycle
• CPU clock cycles = Σi (CPIi * ICi)
   – ICi : count of instructions of class i
   – CPIi : cycles that takes to execute instructions of class i
Scope of Performance Sources
CPU time = IC * CPI * Cycle time
                Program
                Compiler
                  ISA
            Microarchitecture
               Hardware
    Abstraction level interdependence
                   Example 1
•   A program runs in 10 secs on a 2.0 GHz processor.
    A designer wants to build a new computer that can
    run the program in 6 secs by increasing the clock
    frequency. However the average “new” CPI will be
    1.2 times higher.
•   What faster clock rate should the designer use?
     10 IC * CPI / 2 GHz     (current exec. time)
       =
      6 IC * 1.2 CPI / X GHz (target execution time)
        Solve for X, to obtain   X = ___ GHz
                        Example 2
• Comparing two compiler code segments
          Instruction class i         CPIi for instruction class i
                  A                                1
                  B                                2
                  C                                3
         Code          Instruction counts (ICi) for instruction class
       sequence             A                 B                 C
          1                 2                 1                 2
          2                 4                 1                 1
• Which code sequence executes the most instructions?
• Which will be faster? S1 = 2 . 1 + 1 . 2 + 2 . 3 = 10 cyc
                        S2 = 4 . 1 + 1 . 2 + 1 . 3 = 9 cyc
Components of the CPU Equation
• IC Instruction count
  – Compiler
     • Instruction set simulator
     • Execution-based monitoring (profiling)
• CPI if pipelined execution is used
     CPIi = Pipeline CPIi + Memory CPIi
• Clock cycle time
  – Timing estimators or verifiers (complete design)
  – Target cycle time
Performance Evaluation Programs
• Ideal situation: known programs (workload)
• Benchmarks
   –   Real programs
   –   Kernels
   –   Toy benchmarks
   –   Synthetic benchmarks
• Risk: adjust design to benchmark requirements
   – (partial) solution: use real programs
        •   Engineering or scientific applications
        •   Software development tools
        •   Transaction processing
        •   Office applications
           Performance Reports
• Reproducibility
  – Include hardware / software configuration
  – Evaluation process conditions
• Summarizing performance
  –   Total time: CPU time + I/O time + Other time
  –   Arithmetic mean: AM = 1/n * Σ exec timei
  –   Harmonic mean: HM = n / Σ (1/ratei)
  –   Weighted mean: WM = Σ wi * exec timei
  –   Geometric mean: GM = (Π exec time ratioi)1/n
       GM (Xi)       Xi
                 =                      Important Stuff
       GM (Yi)       Yi
Arithmetic and Geometric Means
         A      B     C    Execution time (in seconds)
                            machines: A, B, and C
    P1   1      10    20
                            programs: P1 and P2
    P2   1000   100   20
                      Amdahl’s Law
 • Law of diminishing returns
                          Execution time before improvement
       Speedup =
                           Execution time after improvement
   Example: Two factors - F1: 75% improve, F2: 50% improve
                                                                     F1
                                                                     F2
                                            1
Speedup =
            (1 - fraction enhanced) + (fraction enhanced/factor of improvement)
                                                       Example
    • A program runs in 10 seconds
    • What is the speedup after a faster floating
      point unit is incorporated?
              FP Unit 5 times faster                                                5 seconds in FP operations
6                                                                    2
                                  1
                                                                          Speedup
    Speedup =                                                       1.9
5
                    (1 – Fraction FP) + FractionFP                  1.8
4                                                  5                1.7
                                                                    1.6
3                                                                   1.5                                               1
                                                                                         Speedup =
                                                                    1.4                                                    0.5
2
                                                                    1.3                               0.5 + FP factor of improvement
1                                                                   1.2
                                                   Fraction of FP   1.1                                                    FP Unit im provem ent
0                                                                    1
    0   0.1   0.2    0.3   0.4   0.5   0.6   0.7   0.8   0.9   1            1        2    3   4   5   6   7   8   9       10 11 12 13 14 15
               Conclusions
• Many different performance data
  – CPU time, I/O time, Other time, Total time
• Select best presentation method
  – Arithmetic mean for execution times
  – Geometric mean for performance ratios
• Watch out for Amdahl’s Law
  – Diminishing returns w/ improved performance
  – Impacts           of development