KEMBAR78
Introduction to Computer Architecture: unit 1 | PPTX
Velammal Engineering College
Department of Computer Science
and Engineering
Welcome…
Course Objectives
• This course aims to learn the basic structure and operations of
a computer.
• The course is intended to learn ALU, pipelined execution,
parallelism and multi-core processors.
• The course will enable the students to understand memory
hierarchies, cache memories and virtual memories.
Course Outcomes
CO 1
Discuss the basics structure of computers, operations and
instructions.
CO 2 Design arithmetic and logic unit.
CO 3 Analyze pipelined execution and design control unit.
CO 4 Analyze parallel processing architectures.
CO 5 Examine the performance of various memory systems
CO 6 Organize the various I/O communications.
Syllabus
Unit Titles:
• Unit I Basic Structure of a Computer System
• Unit II Arithmetic for Computers
• Unit III Processor and Control Unit
• Unit IV Parallelism
• Unit V Memory & I/O Systems
Syllabus – Unit I
UNIT-I BASIC STRUCTURE OF A COMPUTER
SYSTEM
Functional Units – Basic operational concepts –– Instructions:
Operations, Operands – Instruction representation – Instruction
Types – MIPS addressing, Performance
Syllabus – Unit II
UNIT-II ARITHMETIC FOR COMPUTERS
Addition and Subtraction – Multiplication – Division – Floating
Point Representation – Floating Point Addition and Subtraction.
Syllabus – Unit III
UNIT-III PROCESSOR AND CONTROL UNIT
A Basic MIPS implementation – Building a Datapath – Control
Implementation Scheme – Pipelining – Pipelined datapath and
control – Handling Data Hazards & Control Hazards.
Syllabus – Unit IV
UNIT-IV PARALLELISM
Introduction to Multicore processors and other shared memory
multiprocessors – Flynn’s classification: SISD, MIMD, SIMD,
SPMD and Vector – Hardware multithreading – GPU
architecture.
Syllabus – Unit V
•UNIT-V MEMORY & I/O SYSTEMS
Memory Hierarchy – memory technologies – Cache Memory –
Performance Considerations, Virtual Memory,TLB’s – Accessing
I/O devices – Interrupts – Direct Memory Access – Bus Structure
– Bus operation.
Text Books
• Book 1:
o Name: Computer Organization and Design: The
Hardware/Software Interface
o Authors: David A. Patterson and John L. Hennessy
o Publisher: Morgan Kaufmann / Elsevier
o Edition: Fifth Edition, 2014
• Book 2:
o Name: Computer Organization and Embedded Systems Interface
o Authors: Carl Hamacher, Zvonko Vranesic, Safwat Zaky and Naraig
Manjikian
o Publisher: Tata McGraw Hill
o Edition: Sixth Edition, 2012
Introduction
• What is mean by Computer Architecture?
Hardware parts
Instruction set
Interface between hardware &
software
Introduction
ISA: a+b -> add a,b ->000100110101010
Instruction Set Architecture
(ISA)
ISA: The interface or contact between the hardware and
the software
Rules about how to code and interpret machine
instructions:
Execution model (program counter)
Operations (instructions)
Data formats (sizes, addressing modes)
Processor state (registers)
Input and Output (memory, etc.)
Introduction
• What is meant by Computer
Architecture?
Computer architecture encompasses
the specification of an instruction set
and the functional behavior of the
hardware units that implement the
instructions.
Introduction
Technology Evolution
UNIT-I
BASIC STRUCTURE OF A
COMPUTER SYSTEM
Topics:
• Functional Units
• Basic operational concepts
• Instructions: Operations, Operands
• Instruction representation
• Instruction Types
• MIPS addressing mode
• Performance
Functional Units
Also called
as Datapath
Functional Units
Functional Units
• Input unit
• Output unit
• Memory unit
• Arithmetic Logic unit
• Control unit
Functional Units
• Input unit
Functional Units
• Output unit
Functional Units
• Memory unit
Functional Units
Functional Units
Functional Units
Arithmetic & Logic unit and Control unit
Basic Operational Concepts
Unit I
Connection between the processor and the main
memory Code Snippet:
Load R2, LOC
Add R4, R3, R2
Store LOC, R4
IR & PC
• Instruction Register:
The instruction register (IR) holds the
instruction that is currently being executed.
• Program Counter:
The program counter (PC) contains the
memory address of the next instruction to be
fetched and executed.
Memory Locations and Addresses
Examples of encoded information in a
32-bit word.
Instructions
Steps in program
translation
ISA
Translations
Machine vs Assembly
Language
Machine Language Assembly Language
• A particular set of
instructions that the
CPU can directly
execute – but these
are ones and zeros
• Ex:
0100001010101
• Assembly language
is a symbolic
version of the
equivalent machine
language
• Ex:
add a,b
Mnemonics
Instructions
• Instruction Set:
o The vocabulary of commands understand by a
given architecture.
• Some ISA:
o ARM
o Intel x86
o IBM Power
o MIPS
o SPARC
• Different CPUs implement different set of
instructions.
MIPS
MIPS - Microprocessor with Interlocked Pipeline Stages
Features:
• five-stage execution pipeline: fetch, decode,
execute, memory-access, write-result
• regular instruction set, all instructions are 32-bit
• three-operand arithmetical and logical instructions
• 32 general-purpose registers of 32-bits each
• only the load and store instruction access memory
• flat address space of 4 GBytes of main memory
(2^32 bytes)
MIPS Assembly Language
• Categories:
oArithmetic – Only processor and registers
involved (sum of two registers)
oData transfer – Interacts with memory (load
and store)
oLogical - Only processor and registers involved
(and, sll)
oConditional branch – Change flow of
execution (branch instructions)
oUnconditional Jump – Change flow of
execution (jump to a subroutine)
MIPS Registers
Arithmetic
Data Transfer
Load & Store Instructions
• Load:
o Transfer data from memory to a register
• Store:
o Transfer a data from a register to memory
• Memory address must be specified by
load and store
•
Processor Memory
STORE
LOAD
Logical
Conditional
Unconditional Jump
MIPS Arithmetic
• All MIPS arithmetic instructions have 3 operands
• Operand order is fixed (e.g., destination first)
• Example:
C code: A = B + C
MIPS code: add $s0, $s1, $s2
compiler’s job to associate
variables with registers
MIPS Arithmetic
• Design Principle 1: simplicity favors regularity.
Translation: Regular instructions make for simple hardware!
• Simpler hardware reduces design time and manufacturing cost.
• Of course this complicates some things...
C code: A = B + C + D;
E = F - A;
MIPS code add $t0, $s1, $s2
(arithmetic): add $s0, $t0, $s3
sub $s4, $s5, $s0
• Performance penalty: high-level code translates to denser machine
code.
Allowing variable
number
of operands would
simplify the assembly
code but complicate the
hardware.
MIPS Arithmetic
a b c f g h i j
$ s 0 $ s 1 $ s 2 $ s 3 $ s 4 $ s 5 $ s 6 $ s 7
a = b - c ;
f = ( g + h ) – ( i + j ) ;
s u b $ s 0 , $ s 1 , $ s 2
a d d $ t 0 , $ s 4 , $ s 5
a d d $ t 1 , $ s 6 , $ s 7
s u b $ s 3 , $ t 0 , $ t 1
1 9 / 6 7
T r y :
1. f = g + ( h – 5 )
2. f = ( i + j ) – ( k – 2 0 )
Registers vs. Memory
Processor I/O
Control
Datapath
Memory
Input
Output
• Arithmetic instructions operands must be in registers
o MIPS has 32 registers
• Compiler associates variables with registers
• What about programs with lots of variables (arrays, etc.)?
Use memory, load/store operations to transfer data from
memory to register – if not enough registers spill registers to
memory
• MIPS is a load/store architecture
Memory Organization
• Viewed as a large single-dimension array with access by
address
• A memory address is an index into the memory array
• Byte addressing means that the index points to a byte of
memory, and that the unit of memory accessed by a
load/store is a byte
0
1
2
3
4
5
6
...
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
Memory Organization
• Bytes are load/store units, but most data items use larger words
• For MIPS, a word is 32 bits or 4 bytes.
• 232
bytes with byte addresses from 0 to 232
-1
• 230
words with byte addresses 0, 4, 8, ... 232
-4
o i.e., words are aligned
o what are the least 2 significant bits of a word address?
0
4
8
12
...
32 bits of data
32 bits of data
32 bits of data
32 bits of data
Registers correspondingly hold 32 bits of data
The Endian Question
Big Endian
31 0
MIPS can also load and
store 4-byte words and
2-byte halfwords.
The endian question:
when you read a word, in
what order do the bytes
appear?
Little Endian: Intel, DEC,
et al.
Big Endian: Motorola,
IBM, Sun, et al.
MIPS can do either
SPIM adopts its host’s
convention
byt e 0 byt e 1 byt e 2 byt e 3
Little Endian
31 0
byt e 3 byt e 2 byt e 1 byt e 0
3 2 / 6 7
The Endian Question
x = 0x01234567
Load/Store Instructions
• Load and store instructions
• Example:
C code: A[8] = h + A[8];
MIPS code (load): lw $t0, 32($s3)
(arithmetic): add $t0, $s2, $t0
(store): sw $t0, 32($s3)
• Load word has destination first, store has destination last
• Remember MIPS arithmetic operands are registers, not
memory locations
o therefore, words must first be moved from memory to registers
using loads before they can be operated on; then result can be
stored back to memory
offset address
value
So far we’ve learned:
• MIPS
o loading words but addressing bytes
o arithmetic on registers only
• Instruction Meaning
add $s1, $s2, $s3 $s1 = $s2 + $s3
sub $s1, $s2, $s3 $s1 = $s2 – $s3
lw $s1, 100($s2) $s1 = Memory[$s2+100]
sw $s1, 100($s2) Memory[$s2+100]= $s1
• Try:Find the assembly code of B[8]=A[i]+A[j];
A and B available in $s6 and $s7
respectively
$so-$s5 consists of the values f-j
Exercise
Q: For the following C statement, what is the
corresponding MIPS assembly code? Assume that the
variables f, g, h, and i are given and could be
considered 32-bit integers as declared in a C
program. Use a minimal number of MIPS assembly
instructions. f = g + (h − 5);
Solution:
f -> $s1, g -> $s2, h -> $s3
addi $t0, $s3,-5
add $s1, $s2, $t0
Representing Instructions
in the Computer
• Instruction format:
o A form of representation of an instruction
composed of fields of binary numbers.
• All MIPS instructions are 32 bit long.
• Three types of instruction formats:
o R-type (for register) or R-format
o I-type (for immediate) or I-format
o J-type (for jump) or J-format
R-type (for register)
• MIPS fields:
• op: Basic operation of the instruction (opcode)
• rs: The first register source operand
• rt: The second register source operand
• rd: The register destination operand
• shamt: Shift amount
• funt: Function. It selects the specific variant of the
operation in the op filed. (function code)
Ex: add $t0, $s1, $s2
I-type (for immediate)
• MIPS fields:
• op: Basic operation of the instruction (opcode)
• rs: The register source operand
• rt: destination register, which receives the result of
the load
• constant or address: It contains 16 bit constant or
address value.
I-type (for immediate)
• MIPS fields:
Ex: addi $t1, $s0, 10
lw $t0, 40($s4)
bne $s5,$s6, 100
J-type (for jump)
• MIPS fields:
• op: Basic operation of the instruction (opcode)
• address: It contains 26 bit address value.
• Ex:
j 10000
Instruction formats for
MIPS architecture
MIPS instruction
encoding
MIPS Registers
Mapping register names
to register numbers
t0 t1 t2 t3 t4 t5 t6 t7
8 9 10 11 12 13 14 15
s0 s1 s2 s3 s4 s5 s6 S7
16 17 18 19 20 21 22 23
Translating a MIPS Assembly
Instruction into a Machine Instruction
Given instruction: add $t0,$s1,$s2
• Solution:
• Identify the type instruction format: R-type
• Format: Operation rd, rs, rt
• rs -> $s1, rt -> $s2, rd -> $t0, shamt – NA
• Op -> , funct ->
• Decimal representation:
• Binary representation:
op rs rt rd shamt funct
0 17 18 8 0 32
op rs rt rd shamt funct
000000 10001 10010 01000 00000 100000
Exercise
Q: Translate the following MIPS Assembly code
into binary code.
sub $t3,$s4,$s5
op rs rt rd Shamt Funct
0 20 21 11 0 34
000000 10100 10101 01011 00000 100010
Exercise
Q: Translate the following MIPS Assembly code
into binary code.
sub $t3,$s4,$s5
000000 10100 10101 01011 00000 100010
Translating a MIPS Assembly
Instruction into a Machine Instruction
Given instruction: lw $t0,32($s3)
• Solution:
• Identify the type instruction format: I-type
• Format: Operation rt, addr.(rs)
• rs -> $s3, rt -> $to, immediate -> 32
• Decimal representation:
• Binary representation:
op rs rt address
35 19 8 32
op rs rt
100011 10011 01000 0000 0000 0010 0000
Exercise
Q: Translate the following MIPS Assembly code
into binary code.
sw $t2,58($s5)
101011 10101 01010 0000 0000 0011 1010
Translating High level Language
into Machine Language
Q: Consider the following high level statement
A[300] = h + A[300];
If $t1 has the base of the array A and $s2 corresponds to
h, What is the MIPS machine language code?
Logical Operations
Shift operations
• Shift allow bits to be moved around inside of a
register.
• Shift left logical
Example: sll $t2,$s0,4 # reg $t2 = reg $s0 << 4 bits
Machine Code:
op rs rt rd shamt funct
000000 00000 10000 01010 00100 000000
Shift Left Logical(sll)
• Example: sll $t2,$s0,4 # reg $t2 = reg $s0 << 4 bits
• If $s0=10
• Value of $t2=???
Shift operations
• Shift right logical
Example: srl $t5,$s3,2 # reg $t5 = reg $s3 >> 2 bits
Machine Code:
op rs rt rd shamt funct
000000 00000 10011 01101 00010 000010
op rs rt rd shamt funct
0 00000 19 13 2 2
Shift Right Logical(srl)
Example: srl $t5,$s3,2 # reg $t5 = reg $s3 >> 2 bits
• If $s3=12
• Value of $t5=???
Logical Operations –
AND, OR & NOT
• A logical bit-by-bit operation with two operands.
• EX:
and $t0,$t1,$t2 # reg $t0 = reg $t1 & reg $t2
or $t0,$t1,$t2 # reg $t0 = reg $t1 | reg $t2
nor $t0,$t1,$t3 # reg $t0 = ~ (reg $t1 | reg $t3)
Example
Instructions for Making
Decisions
• Sequences that allow programs to execute statements in order one
after another.
•  Branches that allow programs to jump to other points in a
program.
•  Loops that allow a program to execute a fragment of code
multiple times.
• MIPS Instructions:
beq register1, register2, L1
bne register1, register2, L1
• beq and bne are mnemonics
• Conditional branches
Instructions for Making
Decisions
Q: In the following code segment, f, g, h, i, and j are
variables. If the five variables f through j
correspond to the five registers $s0 through $s4,
what is the compiled MIPS code for this C if
statement?
if (i == j) f = g + h; else f = g - h;
Instructions for Making
Decisions
• Solution:
Instructions for Making
Decisions
High level code:
if (i == j)
f = g + h;
else
f = g - h;
MIPS code:
bne $s3,$s4,Else # go to Else if i ≠ j
add $s0,$s1,$s2 # f = g + h (skipped if i ≠ j)
j Exit # go to Exit
Else: sub $s0,$s1,$s2 # f = g - h (skipped if i = j)
Exit:
Compiling a while Loop
in C
while (save[i] == k)
i += 1;
Assume that i and k correspond to registers $s3 and
$s5 and the base of the array save is in $s6. What is
the MIPS assembly code corresponding to this C
segment?
Compiling a while Loop
in C
while (save[i] == k)
i += 1;
1. load save[i] into a temporary register
1. add i to the base of array save to form the address
2. performs the loop test
1. go to Exit if save[i] ≠ k
3. adds 1 to I
4. back to the while test at the top of the loop
5. Exit
while (save[i] == k)
i += 1;
Assume that i and k correspond to registers $s3 and
$s5 and the base of the array save is in $s6. What is
the MIPS assembly code corresponding to this C
segment?
Solution:
Loop: sll $t1,$s3,2 # Temp reg $t1 = i * 4
add $t1,$t1,$s6 # $t1 = address of save[i]
lw $t0,0($t1) # Temp reg $t0 = save[i]
bne $t0,$s5, Exit # go to Exit if save[i] ≠ k
addi $s3,$s3,1 # i = i + 1
j Loop # go to Loop
Exit:
MIPS Addressing Mode
• The different ways for specifying the locations
of instruction operands are known as
addressing mode.
• The MIPS addressing modes are the following:
1. Immediate addressing mode
2. Register addressing mode
3. Base or displacement addressing mode
4. PC-relative addressing mode
5. Pseudodirect addressing mode
Immediate addressing mode
• Def:
o the operand is a constant within the instruction itself
• Ex:
o addi $s1, $s2, 20 #$s1=$s2+20
• Ilustration:
Register addressing mode
• Def:
o source and destination operands are registers which are
available in processor registers.
o Direct addressing mode
• Ex:
o add $s1, $s2, $s3 #$s1=$s2+$s3
• Ilustration:
Base or displacement
addressing mode
• Def:
o the operand is at the memory location whose address is the
sum of a register and a constant in the instruction
o Indirect addressing mode
• Ex:
o lw $s1, 20 ($s3) #$s1= Memory[$s3+20]
• Ilustration:
PC-relative addressing mode
• Def:
o the branch address is the sum of the PC and a constant in
the instruction
• Ex:
o bne $s4, $s5, 25 # if ($s4 != $s5), go to
pc=12+4+100
• Ilustration:
Pseudodirect addressing
mode
• Def:
o the jump address is the 26 bits of the instruction
concatenated with the upper bits of the PC
• Ex:
o j 1000
• Ilustration:
Decoding Machine Code
• Q: What is the assembly language statement
corresponding to this machine instruction?
00af8020hex
Solution:
converting hexadecimal to binary
Binary instruction format
Assembly instruction
Language to Assembly
Language
• Translate the following machine language code into
assembly language.
0x02F34022
• Performance is the key to understanding underlying
motivation for the hardware and its organization
• Measure, report, and summarize performance to enable users
to
o make intelligent choices
o see through the marketing hype!
• Why is some hardware better than others for different programs?
• What factors of system performance are hardware related?
(e.g., do we need a new machine, or a new operating system?)
• How does the machine's instruction set affect performance?
Performance
• Response Time (elapsed time, latency):
o how long does it take for my job to run?
o how long does it take to execute (start to
finish) my job?
o how long must I wait for the database query?
• Throughput:
o how many jobs can the machine run at once?
o what is the average execution rate?
o how much work is getting done?
• If we upgrade a machine with a new processor what do we increase?
• If we add a new machine to the lab what do we increase?
Computer Performance:
TIME, TIME, TIME!!!
Individual user
concerns…
Systems manager
concerns…
• Elapsed Time
o counts everything (disk and memory accesses, waiting for I/O,
running other programs, etc.) from start to finish
o a useful number, but often not good for comparison purposes
elapsed time = CPU time + wait time (I/O, other programs, etc.)
• CPU time
o doesn't count waiting for I/O or time spent running other programs
o can be divided into user CPU time and system CPU time (OS calls)
CPU time = user CPU time + system CPU time
 elapsed time = user CPU time + system CPU time + wait time
• Our focus: user CPU time (CPU execution time or, simply,
execution time)
o time spent executing the lines of code that are in our program
Execution Time
• For some program running on machine X:
PerformanceX = 1 / Execution timeX
• If there are two machines X and Y if the performance of X is greater than performance of Y,
PerformanceX > PerformanceY
ie., 1 / Execution timeX > 1 / Execution timeY
• X is n times faster than Y means:
PerformanceX / PerformanceY = n
PerformanceX / PerformanceY = Execution timeY / Execution timeX = n
Definition of Performance
Q: If computer A runs a program in 10 sec
and computer B runs the same program in
15 secs, how much faster is A than B
• We know that,
PerformanceA / PerformanceB
= Execution timeB / Execution timeA = n
Thus the performance ratio is,
Execution timeB / Execution timeA = 15 / 10 = 1.5
ie., PerformanceA / PerformanceB = 1.5
Therfore Peformance of A 1.5 times faster than Performance
of B
Clock Cycles
• Instead of reporting execution time in seconds, we often use
cycles. In modern computers hardware events progress cycle
by cycle: in other words, each event, e.g., multiplication,
addition, etc., is a sequence of cycles
• Clock ticks indicate start and end of cycles:
• cycle time = time between ticks = seconds per cycle
• clock rate (frequency) = clock cycles per second (1 Hz. = 1
cycle/sec, 1 MHz. = 106
cycles/sec)
• Example: A 200 Mhz. clock has a cycle time of ????
time
seconds
program

cycles
program

seconds
cycle
cycle
tick
tick
Performance Equation I
• So, to improve performance one can either:
o reduce the number of cycles for a program, or
o reduce the clock cycle time, or, equivalently,
o increase the clock rate
seconds
program

cycles
program

seconds
cycle
CPU execution time CPU clock cycles Clock cycle time
for a program for a program
=

equivalently
Also, CPU execution time CPU clock cycles / Clock cycle rate
for a program for a program
Our favorite program runs in 10 seconds on computer A, which has a 2
GHz clock. We are trying to help a computer designer build a computer,
B, which will run this program in 6 seconds. The designer has determined
that a substantial increase in the clock rate is possible, but this increase
will affect the rest of the CPU design, causing computer B to require 1.2
times as many clock cycles as computer A for this program. What clock
rate should we tell the designer to target?
CPU timeA = CPU Clock cyclesA / clock rateA
10 sec = CPU Clock cyclesA / 2*109
cycles/sec
CPU Clock cyclesA = 10 sec * 2*109
cycles/sec
= 20 *109
cycles
CPU timeB = 1.2 * CPU Clock cyclesA / clock rateB
6 secs = 1.2 * 20 *109
cycles / clock rateB
clock rateB = 1.2 * 20 *109
cycles / 6 sec= 4 * 109
Hz
9
Instruction Performance
• No reference to no of instructions in previous
equation
• The execution time depends on the number of
instructions in the program
Clock cycles per instruction (CPI)
• Average number of clock cycles per instruction for
a program or program fragment
Suppose we have two implementations of the same
instruction set architecture. Computer A has a clock cycle
time of 250 ps and a CPI of 2.0 for some program, and
computer B has a clock cycle time of 500 ps and a CPI of 1.2
for the same program. Which computer is faster for this
program and by how much?
• Same number of instructions are instructions are
executed
Instruction Performance
CPU execution time = Instruction count * average CPI * Clock cycle time
for a program for a program
Or
CPU execution time = Instruction count * average CPI / Clock rate
for a program for a program
Instruction Performance
Which code sequence
executes the most?
• Sequence 1 executes,
2 + 1 + 2 = 5 instructions
• Sequence 2 executes,
4+ 1 + 1 = 6 instructions
Sequence 2 executes most no of instructions
Which will be faster?
• So code sequence 2 is faster
What is the CPI for each
sequence?
• Sequence 2 has lower CPI as it takes fewer clock
cycles but has more instructions
Basic components of
Performance
Factors affecting
Peformance

Introduction to Computer Architecture: unit 1

  • 1.
    Velammal Engineering College Departmentof Computer Science and Engineering Welcome…
  • 2.
    Course Objectives • Thiscourse aims to learn the basic structure and operations of a computer. • The course is intended to learn ALU, pipelined execution, parallelism and multi-core processors. • The course will enable the students to understand memory hierarchies, cache memories and virtual memories.
  • 3.
    Course Outcomes CO 1 Discussthe basics structure of computers, operations and instructions. CO 2 Design arithmetic and logic unit. CO 3 Analyze pipelined execution and design control unit. CO 4 Analyze parallel processing architectures. CO 5 Examine the performance of various memory systems CO 6 Organize the various I/O communications.
  • 4.
    Syllabus Unit Titles: • UnitI Basic Structure of a Computer System • Unit II Arithmetic for Computers • Unit III Processor and Control Unit • Unit IV Parallelism • Unit V Memory & I/O Systems
  • 5.
    Syllabus – UnitI UNIT-I BASIC STRUCTURE OF A COMPUTER SYSTEM Functional Units – Basic operational concepts –– Instructions: Operations, Operands – Instruction representation – Instruction Types – MIPS addressing, Performance
  • 6.
    Syllabus – UnitII UNIT-II ARITHMETIC FOR COMPUTERS Addition and Subtraction – Multiplication – Division – Floating Point Representation – Floating Point Addition and Subtraction.
  • 7.
    Syllabus – UnitIII UNIT-III PROCESSOR AND CONTROL UNIT A Basic MIPS implementation – Building a Datapath – Control Implementation Scheme – Pipelining – Pipelined datapath and control – Handling Data Hazards & Control Hazards.
  • 8.
    Syllabus – UnitIV UNIT-IV PARALLELISM Introduction to Multicore processors and other shared memory multiprocessors – Flynn’s classification: SISD, MIMD, SIMD, SPMD and Vector – Hardware multithreading – GPU architecture.
  • 9.
    Syllabus – UnitV •UNIT-V MEMORY & I/O SYSTEMS Memory Hierarchy – memory technologies – Cache Memory – Performance Considerations, Virtual Memory,TLB’s – Accessing I/O devices – Interrupts – Direct Memory Access – Bus Structure – Bus operation.
  • 10.
    Text Books • Book1: o Name: Computer Organization and Design: The Hardware/Software Interface o Authors: David A. Patterson and John L. Hennessy o Publisher: Morgan Kaufmann / Elsevier o Edition: Fifth Edition, 2014 • Book 2: o Name: Computer Organization and Embedded Systems Interface o Authors: Carl Hamacher, Zvonko Vranesic, Safwat Zaky and Naraig Manjikian o Publisher: Tata McGraw Hill o Edition: Sixth Edition, 2012
  • 11.
    Introduction • What ismean by Computer Architecture? Hardware parts Instruction set Interface between hardware & software
  • 12.
    Introduction ISA: a+b ->add a,b ->000100110101010
  • 13.
    Instruction Set Architecture (ISA) ISA:The interface or contact between the hardware and the software Rules about how to code and interpret machine instructions: Execution model (program counter) Operations (instructions) Data formats (sizes, addressing modes) Processor state (registers) Input and Output (memory, etc.)
  • 14.
    Introduction • What ismeant by Computer Architecture? Computer architecture encompasses the specification of an instruction set and the functional behavior of the hardware units that implement the instructions.
  • 15.
  • 16.
  • 17.
    UNIT-I BASIC STRUCTURE OFA COMPUTER SYSTEM Topics: • Functional Units • Basic operational concepts • Instructions: Operations, Operands • Instruction representation • Instruction Types • MIPS addressing mode • Performance
  • 18.
  • 19.
  • 20.
    Functional Units • Inputunit • Output unit • Memory unit • Arithmetic Logic unit • Control unit
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
    Functional Units Arithmetic &Logic unit and Control unit
  • 27.
  • 28.
    Connection between theprocessor and the main memory Code Snippet: Load R2, LOC Add R4, R3, R2 Store LOC, R4
  • 29.
    IR & PC •Instruction Register: The instruction register (IR) holds the instruction that is currently being executed. • Program Counter: The program counter (PC) contains the memory address of the next instruction to be fetched and executed.
  • 30.
  • 31.
    Examples of encodedinformation in a 32-bit word.
  • 32.
  • 33.
  • 34.
  • 35.
    Machine vs Assembly Language MachineLanguage Assembly Language • A particular set of instructions that the CPU can directly execute – but these are ones and zeros • Ex: 0100001010101 • Assembly language is a symbolic version of the equivalent machine language • Ex: add a,b
  • 36.
  • 37.
    Instructions • Instruction Set: oThe vocabulary of commands understand by a given architecture. • Some ISA: o ARM o Intel x86 o IBM Power o MIPS o SPARC • Different CPUs implement different set of instructions.
  • 38.
    MIPS MIPS - Microprocessorwith Interlocked Pipeline Stages Features: • five-stage execution pipeline: fetch, decode, execute, memory-access, write-result • regular instruction set, all instructions are 32-bit • three-operand arithmetical and logical instructions • 32 general-purpose registers of 32-bits each • only the load and store instruction access memory • flat address space of 4 GBytes of main memory (2^32 bytes)
  • 39.
    MIPS Assembly Language •Categories: oArithmetic – Only processor and registers involved (sum of two registers) oData transfer – Interacts with memory (load and store) oLogical - Only processor and registers involved (and, sll) oConditional branch – Change flow of execution (branch instructions) oUnconditional Jump – Change flow of execution (jump to a subroutine)
  • 40.
  • 41.
  • 42.
  • 43.
    Load & StoreInstructions • Load: o Transfer data from memory to a register • Store: o Transfer a data from a register to memory • Memory address must be specified by load and store • Processor Memory STORE LOAD
  • 44.
  • 45.
  • 46.
  • 48.
    MIPS Arithmetic • AllMIPS arithmetic instructions have 3 operands • Operand order is fixed (e.g., destination first) • Example: C code: A = B + C MIPS code: add $s0, $s1, $s2 compiler’s job to associate variables with registers
  • 49.
    MIPS Arithmetic • DesignPrinciple 1: simplicity favors regularity. Translation: Regular instructions make for simple hardware! • Simpler hardware reduces design time and manufacturing cost. • Of course this complicates some things... C code: A = B + C + D; E = F - A; MIPS code add $t0, $s1, $s2 (arithmetic): add $s0, $t0, $s3 sub $s4, $s5, $s0 • Performance penalty: high-level code translates to denser machine code. Allowing variable number of operands would simplify the assembly code but complicate the hardware.
  • 50.
    MIPS Arithmetic a bc f g h i j $ s 0 $ s 1 $ s 2 $ s 3 $ s 4 $ s 5 $ s 6 $ s 7 a = b - c ; f = ( g + h ) – ( i + j ) ; s u b $ s 0 , $ s 1 , $ s 2 a d d $ t 0 , $ s 4 , $ s 5 a d d $ t 1 , $ s 6 , $ s 7 s u b $ s 3 , $ t 0 , $ t 1 1 9 / 6 7 T r y : 1. f = g + ( h – 5 ) 2. f = ( i + j ) – ( k – 2 0 )
  • 51.
    Registers vs. Memory ProcessorI/O Control Datapath Memory Input Output • Arithmetic instructions operands must be in registers o MIPS has 32 registers • Compiler associates variables with registers • What about programs with lots of variables (arrays, etc.)? Use memory, load/store operations to transfer data from memory to register – if not enough registers spill registers to memory • MIPS is a load/store architecture
  • 52.
    Memory Organization • Viewedas a large single-dimension array with access by address • A memory address is an index into the memory array • Byte addressing means that the index points to a byte of memory, and that the unit of memory accessed by a load/store is a byte 0 1 2 3 4 5 6 ... 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data
  • 53.
    Memory Organization • Bytesare load/store units, but most data items use larger words • For MIPS, a word is 32 bits or 4 bytes. • 232 bytes with byte addresses from 0 to 232 -1 • 230 words with byte addresses 0, 4, 8, ... 232 -4 o i.e., words are aligned o what are the least 2 significant bits of a word address? 0 4 8 12 ... 32 bits of data 32 bits of data 32 bits of data 32 bits of data Registers correspondingly hold 32 bits of data
  • 54.
    The Endian Question BigEndian 31 0 MIPS can also load and store 4-byte words and 2-byte halfwords. The endian question: when you read a word, in what order do the bytes appear? Little Endian: Intel, DEC, et al. Big Endian: Motorola, IBM, Sun, et al. MIPS can do either SPIM adopts its host’s convention byt e 0 byt e 1 byt e 2 byt e 3 Little Endian 31 0 byt e 3 byt e 2 byt e 1 byt e 0 3 2 / 6 7
  • 55.
  • 56.
    Load/Store Instructions • Loadand store instructions • Example: C code: A[8] = h + A[8]; MIPS code (load): lw $t0, 32($s3) (arithmetic): add $t0, $s2, $t0 (store): sw $t0, 32($s3) • Load word has destination first, store has destination last • Remember MIPS arithmetic operands are registers, not memory locations o therefore, words must first be moved from memory to registers using loads before they can be operated on; then result can be stored back to memory offset address value
  • 57.
    So far we’velearned: • MIPS o loading words but addressing bytes o arithmetic on registers only • Instruction Meaning add $s1, $s2, $s3 $s1 = $s2 + $s3 sub $s1, $s2, $s3 $s1 = $s2 – $s3 lw $s1, 100($s2) $s1 = Memory[$s2+100] sw $s1, 100($s2) Memory[$s2+100]= $s1 • Try:Find the assembly code of B[8]=A[i]+A[j]; A and B available in $s6 and $s7 respectively $so-$s5 consists of the values f-j
  • 58.
    Exercise Q: For thefollowing C statement, what is the corresponding MIPS assembly code? Assume that the variables f, g, h, and i are given and could be considered 32-bit integers as declared in a C program. Use a minimal number of MIPS assembly instructions. f = g + (h − 5); Solution: f -> $s1, g -> $s2, h -> $s3 addi $t0, $s3,-5 add $s1, $s2, $t0
  • 59.
    Representing Instructions in theComputer • Instruction format: o A form of representation of an instruction composed of fields of binary numbers. • All MIPS instructions are 32 bit long. • Three types of instruction formats: o R-type (for register) or R-format o I-type (for immediate) or I-format o J-type (for jump) or J-format
  • 60.
    R-type (for register) •MIPS fields: • op: Basic operation of the instruction (opcode) • rs: The first register source operand • rt: The second register source operand • rd: The register destination operand • shamt: Shift amount • funt: Function. It selects the specific variant of the operation in the op filed. (function code) Ex: add $t0, $s1, $s2
  • 61.
    I-type (for immediate) •MIPS fields: • op: Basic operation of the instruction (opcode) • rs: The register source operand • rt: destination register, which receives the result of the load • constant or address: It contains 16 bit constant or address value.
  • 62.
    I-type (for immediate) •MIPS fields: Ex: addi $t1, $s0, 10 lw $t0, 40($s4) bne $s5,$s6, 100
  • 63.
    J-type (for jump) •MIPS fields: • op: Basic operation of the instruction (opcode) • address: It contains 26 bit address value. • Ex: j 10000
  • 64.
  • 65.
  • 66.
  • 67.
    Mapping register names toregister numbers t0 t1 t2 t3 t4 t5 t6 t7 8 9 10 11 12 13 14 15 s0 s1 s2 s3 s4 s5 s6 S7 16 17 18 19 20 21 22 23
  • 68.
    Translating a MIPSAssembly Instruction into a Machine Instruction Given instruction: add $t0,$s1,$s2 • Solution: • Identify the type instruction format: R-type • Format: Operation rd, rs, rt • rs -> $s1, rt -> $s2, rd -> $t0, shamt – NA • Op -> , funct -> • Decimal representation: • Binary representation: op rs rt rd shamt funct 0 17 18 8 0 32 op rs rt rd shamt funct 000000 10001 10010 01000 00000 100000
  • 69.
    Exercise Q: Translate thefollowing MIPS Assembly code into binary code. sub $t3,$s4,$s5 op rs rt rd Shamt Funct 0 20 21 11 0 34 000000 10100 10101 01011 00000 100010
  • 70.
    Exercise Q: Translate thefollowing MIPS Assembly code into binary code. sub $t3,$s4,$s5 000000 10100 10101 01011 00000 100010
  • 71.
    Translating a MIPSAssembly Instruction into a Machine Instruction Given instruction: lw $t0,32($s3) • Solution: • Identify the type instruction format: I-type • Format: Operation rt, addr.(rs) • rs -> $s3, rt -> $to, immediate -> 32 • Decimal representation: • Binary representation: op rs rt address 35 19 8 32 op rs rt 100011 10011 01000 0000 0000 0010 0000
  • 72.
    Exercise Q: Translate thefollowing MIPS Assembly code into binary code. sw $t2,58($s5) 101011 10101 01010 0000 0000 0011 1010
  • 73.
    Translating High levelLanguage into Machine Language Q: Consider the following high level statement A[300] = h + A[300]; If $t1 has the base of the array A and $s2 corresponds to h, What is the MIPS machine language code?
  • 74.
  • 75.
    Shift operations • Shiftallow bits to be moved around inside of a register. • Shift left logical Example: sll $t2,$s0,4 # reg $t2 = reg $s0 << 4 bits Machine Code: op rs rt rd shamt funct 000000 00000 10000 01010 00100 000000
  • 76.
    Shift Left Logical(sll) •Example: sll $t2,$s0,4 # reg $t2 = reg $s0 << 4 bits • If $s0=10 • Value of $t2=???
  • 77.
    Shift operations • Shiftright logical Example: srl $t5,$s3,2 # reg $t5 = reg $s3 >> 2 bits Machine Code: op rs rt rd shamt funct 000000 00000 10011 01101 00010 000010 op rs rt rd shamt funct 0 00000 19 13 2 2
  • 78.
    Shift Right Logical(srl) Example:srl $t5,$s3,2 # reg $t5 = reg $s3 >> 2 bits • If $s3=12 • Value of $t5=???
  • 79.
    Logical Operations – AND,OR & NOT • A logical bit-by-bit operation with two operands. • EX: and $t0,$t1,$t2 # reg $t0 = reg $t1 & reg $t2 or $t0,$t1,$t2 # reg $t0 = reg $t1 | reg $t2 nor $t0,$t1,$t3 # reg $t0 = ~ (reg $t1 | reg $t3)
  • 80.
  • 81.
    Instructions for Making Decisions •Sequences that allow programs to execute statements in order one after another. •  Branches that allow programs to jump to other points in a program. •  Loops that allow a program to execute a fragment of code multiple times. • MIPS Instructions: beq register1, register2, L1 bne register1, register2, L1 • beq and bne are mnemonics • Conditional branches
  • 82.
    Instructions for Making Decisions Q:In the following code segment, f, g, h, i, and j are variables. If the five variables f through j correspond to the five registers $s0 through $s4, what is the compiled MIPS code for this C if statement? if (i == j) f = g + h; else f = g - h;
  • 83.
  • 84.
    Instructions for Making Decisions Highlevel code: if (i == j) f = g + h; else f = g - h; MIPS code: bne $s3,$s4,Else # go to Else if i ≠ j add $s0,$s1,$s2 # f = g + h (skipped if i ≠ j) j Exit # go to Exit Else: sub $s0,$s1,$s2 # f = g - h (skipped if i = j) Exit:
  • 85.
    Compiling a whileLoop in C while (save[i] == k) i += 1; Assume that i and k correspond to registers $s3 and $s5 and the base of the array save is in $s6. What is the MIPS assembly code corresponding to this C segment?
  • 86.
    Compiling a whileLoop in C while (save[i] == k) i += 1; 1. load save[i] into a temporary register 1. add i to the base of array save to form the address 2. performs the loop test 1. go to Exit if save[i] ≠ k 3. adds 1 to I 4. back to the while test at the top of the loop 5. Exit
  • 87.
    while (save[i] ==k) i += 1; Assume that i and k correspond to registers $s3 and $s5 and the base of the array save is in $s6. What is the MIPS assembly code corresponding to this C segment? Solution: Loop: sll $t1,$s3,2 # Temp reg $t1 = i * 4 add $t1,$t1,$s6 # $t1 = address of save[i] lw $t0,0($t1) # Temp reg $t0 = save[i] bne $t0,$s5, Exit # go to Exit if save[i] ≠ k addi $s3,$s3,1 # i = i + 1 j Loop # go to Loop Exit:
  • 88.
    MIPS Addressing Mode •The different ways for specifying the locations of instruction operands are known as addressing mode. • The MIPS addressing modes are the following: 1. Immediate addressing mode 2. Register addressing mode 3. Base or displacement addressing mode 4. PC-relative addressing mode 5. Pseudodirect addressing mode
  • 89.
    Immediate addressing mode •Def: o the operand is a constant within the instruction itself • Ex: o addi $s1, $s2, 20 #$s1=$s2+20 • Ilustration:
  • 90.
    Register addressing mode •Def: o source and destination operands are registers which are available in processor registers. o Direct addressing mode • Ex: o add $s1, $s2, $s3 #$s1=$s2+$s3 • Ilustration:
  • 91.
    Base or displacement addressingmode • Def: o the operand is at the memory location whose address is the sum of a register and a constant in the instruction o Indirect addressing mode • Ex: o lw $s1, 20 ($s3) #$s1= Memory[$s3+20] • Ilustration:
  • 92.
    PC-relative addressing mode •Def: o the branch address is the sum of the PC and a constant in the instruction • Ex: o bne $s4, $s5, 25 # if ($s4 != $s5), go to pc=12+4+100 • Ilustration:
  • 93.
    Pseudodirect addressing mode • Def: othe jump address is the 26 bits of the instruction concatenated with the upper bits of the PC • Ex: o j 1000 • Ilustration:
  • 94.
    Decoding Machine Code •Q: What is the assembly language statement corresponding to this machine instruction? 00af8020hex Solution: converting hexadecimal to binary Binary instruction format Assembly instruction
  • 95.
    Language to Assembly Language •Translate the following machine language code into assembly language. 0x02F34022
  • 96.
    • Performance isthe key to understanding underlying motivation for the hardware and its organization • Measure, report, and summarize performance to enable users to o make intelligent choices o see through the marketing hype! • Why is some hardware better than others for different programs? • What factors of system performance are hardware related? (e.g., do we need a new machine, or a new operating system?) • How does the machine's instruction set affect performance? Performance
  • 97.
    • Response Time(elapsed time, latency): o how long does it take for my job to run? o how long does it take to execute (start to finish) my job? o how long must I wait for the database query? • Throughput: o how many jobs can the machine run at once? o what is the average execution rate? o how much work is getting done? • If we upgrade a machine with a new processor what do we increase? • If we add a new machine to the lab what do we increase? Computer Performance: TIME, TIME, TIME!!! Individual user concerns… Systems manager concerns…
  • 98.
    • Elapsed Time ocounts everything (disk and memory accesses, waiting for I/O, running other programs, etc.) from start to finish o a useful number, but often not good for comparison purposes elapsed time = CPU time + wait time (I/O, other programs, etc.) • CPU time o doesn't count waiting for I/O or time spent running other programs o can be divided into user CPU time and system CPU time (OS calls) CPU time = user CPU time + system CPU time  elapsed time = user CPU time + system CPU time + wait time • Our focus: user CPU time (CPU execution time or, simply, execution time) o time spent executing the lines of code that are in our program Execution Time
  • 99.
    • For someprogram running on machine X: PerformanceX = 1 / Execution timeX • If there are two machines X and Y if the performance of X is greater than performance of Y, PerformanceX > PerformanceY ie., 1 / Execution timeX > 1 / Execution timeY • X is n times faster than Y means: PerformanceX / PerformanceY = n PerformanceX / PerformanceY = Execution timeY / Execution timeX = n Definition of Performance
  • 100.
    Q: If computerA runs a program in 10 sec and computer B runs the same program in 15 secs, how much faster is A than B • We know that, PerformanceA / PerformanceB = Execution timeB / Execution timeA = n Thus the performance ratio is, Execution timeB / Execution timeA = 15 / 10 = 1.5 ie., PerformanceA / PerformanceB = 1.5 Therfore Peformance of A 1.5 times faster than Performance of B
  • 101.
    Clock Cycles • Insteadof reporting execution time in seconds, we often use cycles. In modern computers hardware events progress cycle by cycle: in other words, each event, e.g., multiplication, addition, etc., is a sequence of cycles • Clock ticks indicate start and end of cycles: • cycle time = time between ticks = seconds per cycle • clock rate (frequency) = clock cycles per second (1 Hz. = 1 cycle/sec, 1 MHz. = 106 cycles/sec) • Example: A 200 Mhz. clock has a cycle time of ???? time seconds program  cycles program  seconds cycle cycle tick tick
  • 102.
    Performance Equation I •So, to improve performance one can either: o reduce the number of cycles for a program, or o reduce the clock cycle time, or, equivalently, o increase the clock rate seconds program  cycles program  seconds cycle CPU execution time CPU clock cycles Clock cycle time for a program for a program =  equivalently Also, CPU execution time CPU clock cycles / Clock cycle rate for a program for a program
  • 103.
    Our favorite programruns in 10 seconds on computer A, which has a 2 GHz clock. We are trying to help a computer designer build a computer, B, which will run this program in 6 seconds. The designer has determined that a substantial increase in the clock rate is possible, but this increase will affect the rest of the CPU design, causing computer B to require 1.2 times as many clock cycles as computer A for this program. What clock rate should we tell the designer to target? CPU timeA = CPU Clock cyclesA / clock rateA 10 sec = CPU Clock cyclesA / 2*109 cycles/sec CPU Clock cyclesA = 10 sec * 2*109 cycles/sec = 20 *109 cycles CPU timeB = 1.2 * CPU Clock cyclesA / clock rateB 6 secs = 1.2 * 20 *109 cycles / clock rateB clock rateB = 1.2 * 20 *109 cycles / 6 sec= 4 * 109 Hz 9
  • 104.
    Instruction Performance • Noreference to no of instructions in previous equation • The execution time depends on the number of instructions in the program Clock cycles per instruction (CPI) • Average number of clock cycles per instruction for a program or program fragment
  • 105.
    Suppose we havetwo implementations of the same instruction set architecture. Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for some program, and computer B has a clock cycle time of 500 ps and a CPI of 1.2 for the same program. Which computer is faster for this program and by how much? • Same number of instructions are instructions are executed
  • 106.
    Instruction Performance CPU executiontime = Instruction count * average CPI * Clock cycle time for a program for a program Or CPU execution time = Instruction count * average CPI / Clock rate for a program for a program
  • 107.
  • 108.
    Which code sequence executesthe most? • Sequence 1 executes, 2 + 1 + 2 = 5 instructions • Sequence 2 executes, 4+ 1 + 1 = 6 instructions Sequence 2 executes most no of instructions
  • 109.
    Which will befaster? • So code sequence 2 is faster
  • 110.
    What is theCPI for each sequence? • Sequence 2 has lower CPI as it takes fewer clock cycles but has more instructions
  • 111.
  • 112.

Editor's Notes

  • #97 Have them raise their hands when answering questions