Tikrit University The academic year 2019-2020
College of Petroleum Process Eng.
Petroleum and Control Eng. Dept.. Course:Computer Architecture1
Chapter7: Parallel and Pipelined
Processing
Basic Ideas
• Parallel processing • Pipelined processing
time time
P1 a1 a2 a3 a4 P1 a1 b1 c1 d1
P2 b1 b2 b3 b4 P2 a2 b2 c2 d2
P3 c1 c2 c3 c4 P3 a3 b3 c3 d3
P4 d1 d2 d3 d4 P4 a4 b4 c4 d4
Less inter-processor communication More inter-processor communication
Complicated processor hardware Simpler processor hardware
different types of operations performed
a, b, c, d: different data streams processed
Data Dependence
• Parallel processing requires NO • Pipelined processing will
data dependence between involve inter-processor
processors communication
P1 P1
P2 P2
P3 P3
P4 P4
time time
Basic Pipeline
Five stage “RISC” load-store architecture
1. Instruction fetch (IF)
• get instruction from memory, increment PC
2. Instruction Decode (ID)
• translate opcode into control signals and read registers
3. Execute (EX)
• perform ALU operation, compute jump/branch targets
4. Memory (MEM)
• access memory if needed
5. Writeback (WB)
• update register file
Time Graphs
Clock cycle
1 2 3 4 5 6 7 8 9
add IF ID EX MEM WB
lw IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
Latency: 5 cycles
Throughput: 1 instr/cycle
Concurrency: 5 CPI = 1
Cycles Per Instruction (CPI)
• Instruction mix for some program P, assume:
• 25% load/store ( 3 cycles / instruction)
• 60% arithmetic ( 2 cycles / instruction)
• 15% branches ( 1 cycle / instruction)
• Multi-Cycle performance for program P:
• 3 * .25 + 2 * .60 + 1 * .15 = 2.1
• average cycles per instruction (CPI) = 2.1
SIX STAGE OF INSTRUCTION PIPELINING
Fetch Instruction(FI)
Read the next expected instruction into a buffer
Decode Instruction(DI)
Determine the opcode and the operand specifiers.
Calculate Operands(CO)
Calculate the effective address of each source operand.
Fetch Operands(FO)
Fetch each operand from memory. Operands in registers need
not be fetched.
Execute Instruction(EI)
Perform the indicated operation and store the result
Write Operand(WO)
Store the result in memory.
Timing diagram for instruction pipeline
operation
Six-stage CPU instruction pipeline
6
Pipeline Performance: Clock & Timing
Si Si+1
m d
Clock cycle of the pipeline :
Latch delay : d
= max { m }+d
Pipeline frequency : f
f=1/
Advantages
• Pipelining makes efficient use of resources.
• Quicker time of execution of large number of
instructions
• The parallelism is invisible to the programmer.
Speed Up Equation for Pipelining
For simple RISC pipeline, CPI = 1:
Reduced Instruction Set Computers(
(RISC) Pipelining
(RISC)Pipelining
• Key Features of RISC
– Limited and simple instruction set
– Memory access instructions limited to memory <-> registers
– Operations are register to register
– Large number of general purpose registers
(and use of compiler technology to optimize register use)
– Emphasis on optimising the instruction pipeline
(& memory management)
– Hardwired for speed (no microcode)
Memory to Memory vs Register to Memory
Operations
• (RISC uses only Register to memory)
RISC Pipelining Basics
• Define two phases of execution for register based instructions
– I: Instruction fetch
– E: Execute
• ALU operation with register input and output
• For load and store there will be three
– I: Instruction fetch
– E: Execute
• Calculate memory address
– D: Memory
• Register to memory or memory to register operation
For simple RISC pipeline, CPI = 1:
Effects of RISC Pipelining
(b) Three stage pipelined timing
Optimization of RISC Pipelining
• Delayed branch
– Leverages branch that does not take effect until
after execution of following instruction
زفحي عرف يذ ل لخدي زيح ذيفنت ىتح دعب ذيفنت عت مي ت ت ةي
– The following instruction becomes the delay slot
Normal vs Delayed Branch