CSE340: Computer Architecture
Chapter - 1: Computer Abstractions and Technology
Understanding Performance
Hardware or Software How this component affects performance? Where to learn?
Component
Algorithm Determines both the number of source-level Covered in CSE221
statements and the number of I/O operations
executed.
Programming Language, Determines the number of computer Chapter 2 & 3
Compiler and architecture instructions for each source-level statement.
Processor and memory system Determines how fast instructions can be Chapter 4, 5 & 6
executed
I/O system (Hardware and OS) Determines how fast I/O operations may be Chapter 4, 5 & 6
executed
Seven Great Ideas in Computer Architecture
Ideas Explanation
Use abstraction to simplify design Divide systems into layers to manage complexity and hiding the lower
level details.
Make the common case fast Optimizing frequently performed operations.
Performance via parallelism Running tasks simultaneously to boost speed.
Performance via pipelining A way to achieve parallelism. More will be discussed in Chapter - 4
Performance via prediction Anticipate future data or values to avoid delays.
Hierarchy of memories Organizing memory into levels (registers, cache, RAM, storage) based
on speed and size.
Dependability via redundancy Adding extra components ensures reliability by compensating for
failures in critical hardware or data.
Below Your Program
■ Application software
■ Written in high-level language
■ System software
■ Compiler: translates HLL code to machine code
■ Operating System: service code
■ Handling input/output
■ Managing memory and storage
■ Scheduling tasks & sharing resources
■ Hardware
■ Processor, memory, I/O controllers
Levels of Program Code
■ High-level language
■ Level of abstraction closer to
problem domain
■ Provides for productivity and
portability
■ Assembly language
■ Textual representation of
instructions
■ Hardware representation
■ Binary digits (bits)
■ Encoded instructions and data
Components of a Computer
The five classic components of a
computer are:
Input
Output
Memory
Datapath
Control
■ Datapath and control are sometimes
combined and called the processor.
Figure: The standard organization of a computer
Opening the Box
Figure: Components of the Apple iPhone XS Max Figure: Logic board of the Apple iPhone XS Max
Opening the Box
Figure: The processor integrated circuit inside the A12 package
Technologies for Building Processors
Figure: Manufacturing process of an IC
Intel 10th Gen. Wafer
This 10nm wafer contains 10th Gen Intel®
Core processors, code-named “Ice Lake”
(Courtesy Intel).
The number of dies on this 300 mm (12 inch)
wafer at 100% yield is 506.
Each die or chip size = 11.4mm by 10.7mm
Yield: The percentage of good dies from the
total number of dies on the wafer.
Die: The individual rectangular sections that
are cut from a wafer, more informally
known as chips.
Figure: A 12-inch (300mm) wafer
IC Cost Calculation
Note:
N in the yield formula is
a model parameter that
determines how defects
and die size impact
yield. It is typically
derived from real-world
data and can range
between 1 and 2, though
■ Nonlinear relation to area and defect rate it may vary outside this
range depending on the
■ Wafer cost and area are fixed technology and defect
behavior.
■ Defect rate determined by manufacturing process
■ Die area determined by architecture and circuit design
Defining Performance
Based on the table below which airplane has the best performance?
Figure: The capacity, range, and speed for a number of commercial airplanes
Response Time and Throughput
Response time: The time between the start and completion of a task. Also
known as execution time.
Throughput: Total work done per unit time.
e.g., tasks/transactions/… per hour
We’ll focus on response time for now…
Answer yourself:
How are response time and throughput affected by,
1. Replacing the processor with a faster version?
2. Adding additional processors to a system that uses multiple processors
for separate tasks?
Relative Performance
To maximize performance, we want to minimize the response/execution
time for a task.
The relation between performance and execution time for computer X:
“X is n time faster than Y”
■ Example: time taken to run a program 10s on A, 15s on B
■ Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
■ So A is 1.5 times faster than B
Measuring Execution Time
Elapsed time: Total response time, including all aspects. E.g Processing,
I/O, OS overhead, idle time etc.
Determines system performance
CPU time: The actual time the CPU spends computing for a specific task.
Discounts I/O time, other jobs’ shares
Comprises user CPU time and system CPU time
Different programs are affected differently by CPU and system performance
CPU Clocking
Operation of digital hardware governed by a constant-rate clock.
Clock period
Clock (cycles)
Data transfer
and computation
Update state
■ Clock period: Time taken to complete a clock cycle
■ e.g., 250ps = 0.25ns = 250×10–12s
■ Clock frequency (rate): Clock cycles completed per second
■ e.g., 4.0GHz = 4000MHz = 4.0×109Hz
CPU Performance and Its Factors
Answer yourself:
Analyze the formula mentioned above and determine how can you improve
the performance of a program?
CPU Performance Example
Problem statement:
A program runs in 10 seconds on computer A, which has a 2GHz clock. We are trying to help a
computer designer build a computer, B, which will run this program in 6 seconds. The designer has
determined that a substantial increase in the clock rate is possible, but this increase will affect the rest
of the CPU design, causing computer B to require 1.2 times as many clock cycles as computer A for
this program. What clock rate should we tell the designer to target?
Solution:
Instruction Count and CPI
■ Instruction Count for a program
■ Determined by program, ISA and compiler
■ Average cycles per instruction
■ Determined by CPU hardware
■ If different instructions have different CPI
■ Average CPI affected by instruction mix
Instruction Count & CPI Example
■ Computer A: Cycle Time = 250ps, CPI = 2.0
■ Computer B: Cycle Time = 500ps, CPI = 1.2
■ Same ISA
■ Which is faster, and by how much?
Answer yourself:
Why we assumed that the
instruction count is same in
both cases?
So, A is 1.2 times faster
Average CPI with Example
■ If different instruction classes take different numbers of cycles
■ Weighted average CPI
Problem Statement:
Alternative compiled code sequences using instructions in classes A, B, C
Solution:
Power Trends
■ In CMOS IC technology
Reducing Power
■ Suppose a new CPU has
■ 85% of capacitive load of old CPU
■ 15% voltage and 15% frequency reduction
■ The power wall Answer yourself:
■ We can’t reduce voltage further
How else can we improve
■ We can’t remove more heat performance?
Uniprocessor Performance
Figure: Growth in processor performance since the mid-1980s
Constrained by power, instruction-level parallelism, memory latency
Multiprocessors
■ Multicore microprocessors
■ More than one processor per chip
■ Requires explicitly parallel programming
■ Compare with instruction level parallelism
■ Hardware executes multiple instructions at once
■ Hidden from the programmer
■ Hard to do
■ Programming for performance
■ Load balancing
■ Optimizing communication and synchronization
Chapter 1 — Computer Abstractions and Technology — 25
SPEC CPU Benchmark
■ What is Benchmark?
Programs used to measure performance
■ Standard Performance Evaluation Corp (SPEC)
■ Develops benchmarks for CPU, I/O, Web, …
■ SPEC CPU2017
■ Elapsed time to execute a selection of programs
■ Negligible I/O, so focuses on CPU performance
■ Normalize relative to reference machine
■ Summarize as geometric mean of performance ratios
■ SPECspeed2017 (integer) and CFP2006 (floating-point)
SPEC Ratio & Geometric Mean Calculation
Note:
Reference time
is supplied by
SPEC
Amdahl’s Law
■ Improving an aspect of a computer and expecting a proportional improvement in
overall performance
■ Example: multiply accounts for 80s/100s
■ How much improvement in multiply performance to get 5× overall?
■ Can’t be done!
■ Corollary: make the common case fast
MIPS as a Performance Metric
■ MIPS: Millions of Instructions Per Second
■ Doesn’t account for
■ Differences in ISAs between computers
■ Differences in complexity between instructions
■ CPI varies between programs on a given CPU
Concluding Remarks
■ Cost/performance is improving
■ Due to underlying technology development
■ Hierarchical layers of abstraction
■ In both hardware and software
■ Instruction set architecture
■ The hardware/software interface
■ Execution time: the best performance measure
■ Power is a limiting factor
■ Use parallelism to improve performance