0% found this document useful (0 votes)

200 views28 pages

CPU Pipelining Concepts

Pipelining is a technique where the instruction execution process is divided into discrete stages so that multiple instructions can be overlapped in execution. By partitioning the instruction execution process into stages like fetch, decode, execute etc and ensuring independent hardware for each stage, multiple instructions can progress through the pipeline simultaneously, improving instruction throughput. However, pipelining can introduce hazards like structural hazards from resource conflicts, data hazards from dependencies between instructions, and control hazards from branches. Solutions involve buffering, forwarding, interlocking, and dynamic/static scheduling.

Uploaded by

Syed Ashmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

200 views28 pages

CPU Pipelining Concepts

Uploaded by

Syed Ashmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Pipelining Basic Concepts

Partition the function of instruction execution into smaller functions

(called a stage), allocate separate concurrently operating hardware
components so that at any given time, there exist several instructions
in various stages of execution.

Instruction Operand Execute

fetch fetch
IF OF EX

1
Typical Non-Pipelined Execution

EX I0 I1 I2
OF I0 I1 I2
IF I0 I1 I2 I3
1 2 3 4 5 6 7 8 9 10

Time to execute n instructions: 3nt

2
Ideal Pipelined Execution

EX I0 I1 I2 I3 I4 I5 I6 I7
OF I0 I1 I2 I3 I4 I5 I6 I7 I8
IF I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
1 2 3 4 5 6 7 8 9 10

Time to execute n instructions: (2 + n)t

3
Pipeline Turbulence

Consider the presence of a branch instruction in the pipeline:

EX I0 I1 I2 I3 Ibr Ik
OF I0 I1 I2 I3 Ibr Ik Ik+1
IF I0 I1 I2 I3 Ibr Ik Ik+1 Ik+2
1 2 3 4 5 6 7 8 9 10

Branch instructions introduce control hazards into the pipeline,

negatively impacting pipeline performance.

4
Multicycle Execution Units
EX

Integer unit

FP/integer
multiply

IF ID MEM WB
EX

FP adder

FP/integer
divider

FIGURE 3.42 The DLX pipeline with three additional unpipelined, floating-point,
functional units.

5
SuperScalar/Multiple Issue

Floating
Point
Unit
Instr Buffer
Issue
Unit Integer Memory

Switch
Crossbar
Unit 1 Module 0

Integer Memory
Unit 2 Module 1

6
Hazards

Structural Hazards: arising from resource conflicts when the

hardware cannot support all possible combinations of instructions in
simultaneous overlapped execution.
Data Hazards: arising when an instruction in the pipeline depends on
the results of a previous instruction still in the pipeline.
Control Hazards: arising from the pipelining of branches or other
instructions that change the PC.

Hazards are easily solved by stalling, but stalling reduces pipeline

efficiency.

Structural hazards can be solved by extra hardware resource — a

cost/performance tradeoff.
Load/Store architectures substantially reduce complexity of hazards.

7
Structural Hazards

• Split I/D caches

• Pipelined Execute Units
• Buffers on Execute Units
• Hardware replication

8
Pipelined & Buffered Execution Units

Integer unit

FP/integer multiply

M1 M2 M3 M4 M5 M6 M7

IF ID MEM WB
FP adder

A1 A2 A3 A4

FP/integer divider

DIV

FIGURE 3.44 A pipeline that supports multiple outstanding FP operations.

9
Data Hazards

RAW: read after write

WAW: write after write (only present when writes occur at different
stages in a pipeline)

WAR: write after read (only possible when writes may occur earlier
than some reads)

Note: RAR (read after read) is not a hazard.

10
Data Hazards: Potential Solutions

• Pipeline interlocking

• Forwarding

• Compiler optimizations

11
Pipeline Interlocking

• Stall pipeline until data hazard is eliminated

• Loses performance benefits of pipelining

12
Pipeline Interlocking

Assume RAW conflict between I3 and I4.

EX I0 I1 I2 I3 I4 I5 I6
OF I0 I1 I2 I3 I4 I5 I6 I7
IF I0 I1 I2 I3 I4 I5 I6 I7 I8
1 2 3 4 5 6 7 8 9 10

13
Data Forwarding

• Organize data path with routes back from later pipe stages into
earlier pipe stages.

• Efficient operation best supported when operand encoding is simple.

14
Data Forwarding

rd rs1 rs2 rd rs1 rs2

Operand Execute
fetch

15
Static Scheduling

• Compiler optimization to schedule instructions as per data hazard

considerations.

• Organize code into basic blocks (single entry/single exit)

• Schedule instructions to minimize data hazards between adjacent

instructions.

16
Dynamic Scheduling

• Scoreboarding
• Tomasulo’s Algorithm
• VLIW

17
An Architecture for Scoreboarding
Registers Data buses

FP mult
FP mult

FP divide

FP add

Integer unit

Scoreboard
Control/ Control/
status status

FIGURE 4.3 The basic structure of a DLX processor with a scoreboard.

18
An Architecture for Tomasulo
From instruction unit
Floating-
From point
memory operation
queue FP registers
Load buffers
6
5
4
3
2 Operand Store buffers
1 buses 3
2
1

Operation bus To
memory

3 2
2 Reservation 1
1 stations

FP adders FP multipliers

Common data bus (CDB)

FIGURE 4.8 The basic structure of a DLX FP unit using Tomasulo's algorithm.

19
Structural/Data Hazards and the Original Pentium

Floating
Point
Unit
Instr Buffer
Issue
Unit Integer Memory

Switch
Crossbar
Unit 1 Module 0

Integer Memory
Unit 2 Module 1

20
Control Hazards

• Evaluate branches earlier

• Stalls/pipelined interlocks

• Delayed Branch

• Statically predict taken/not-taken & flush when not

• Canceling (or nullifying) branch: compiler prediction/flush

• Dynamic (hardware) prediction

• Conditional Instructions

21
Delayed Branch

Redefine architecture so branches take effect after n instrs after branch

• Where to get instructions to fill branch delay slot?

– Noop
– Before the branch
– After the branch (from both destinations or, if needed/possible,
only one)
• Compiler effectiveness
– Fills about 60% of delay slots (when one slot)
– About 80% instrs in delay slot useful
• Complications: deep pipelines & superscalar implementations

22
Predict Taken/Not-Taken, Statically or Dynamically

Assume I3 is a conditional branch & branch resolved in EX stage.

Correct Prediction:

EX I0 I1 I2 I3 I4 I5 I6 I7
OF I0 I1 I2 I3 I4 I5 I6 I7 I8
IF I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
1 2 3 4 5 6 7 8 9 10

Incorrect Prediction:

EX I0 I1 I2 I3 Ik
OF I0 I1 I2 I3 I4 Ik Ik+1
IF I0 I1 I2 I3 I4 I5 Ik Ik+1 IK k + 2
1 2 3 4 5 6 7 8 9 10

23
Branch Prediction

Taken

Not taken

Predict taken Predict taken

Taken

Taken Not taken

Not taken

Predict not taken Predict not taken

Taken

Not taken

FIGURE 4.13 The states in a two-bit prediction scheme.

24
Branch Prediction w/ History

Branch address
4

2–bit per branch predictors

XX XX prediction

2–bit global branch history

FIGURE 4.20 A (2,2) branch-prediction buffer uses a two-bit global history to

choose from among four predictors for each branch address.

25
Branch Target Buffers

PC of instruction to fetch

Look up Predicted PC

Number of
entries
in branch-
target
buffer

No: instruction is
= not predicted to be Branch
branch. Proceed normally predicted
taken or
Yes: then instruction is branch and predicted untaken
PC should be used as the next PC

FIGURE 4.22 A branch-target buffer.

26
Send PO to
memory and

Step-by-step use of Branch Target Buffers

branch-target
buffer

No Entry found in Yes

branch-target
buffer?

Send out
predicted
Is
PC
No instruction Yes
a taken
branch?

No Taken Yes
branch?
Normal
instruction
execution

Enter Mispredicted Branch

branch IO branch, kill fetched correctly
and next PC instruction; restart predicted;
EX into branch fetch at other continue
target buffer target; delete execution with
entry from no stalls
target buffer

FIGURE 4.23 The steps involved in handling an instruction with a branch-target

buffer.
27
Exceptions

1. Synchronous versus Asynchronous

2. User requested versus coerced
3. User maskable (or not)
4. Within or between instructions
5. Resumption versus Termination
6. Precise/Imprecise

Instruction Pipelining Basics
No ratings yet
Instruction Pipelining Basics
20 pages
Kuliah 14 Pipeliningg
No ratings yet
Kuliah 14 Pipeliningg
28 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
CA7 2024S2 New
No ratings yet
CA7 2024S2 New
30 pages
DLCO Module 6 Sem 3
No ratings yet
DLCO Module 6 Sem 3
40 pages
Chapter 4
No ratings yet
Chapter 4
78 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Moduel 5
No ratings yet
Moduel 5
46 pages
ch4 3
No ratings yet
ch4 3
61 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Computer Architecture Insights
100% (1)
Computer Architecture Insights
55 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
21 pages
Slides Chapter 6 Pipelining
No ratings yet
Slides Chapter 6 Pipelining
60 pages
Pipelining New
No ratings yet
Pipelining New
33 pages
Pipeline - Instr - Super Branch
No ratings yet
Pipeline - Instr - Super Branch
48 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
53 pages
10 Pipelining
No ratings yet
10 Pipelining
44 pages
CEA201 - Chapter 14 - Processor Structure and Function
No ratings yet
CEA201 - Chapter 14 - Processor Structure and Function
42 pages
CPU Architecture Essentials
No ratings yet
CPU Architecture Essentials
40 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
55 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
Slot24 25 CH14 ProcessorStructureAndFunction 42 Slots
No ratings yet
Slot24 25 CH14 ProcessorStructureAndFunction 42 Slots
42 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
Lecutre-7 Instruction Pipelining
No ratings yet
Lecutre-7 Instruction Pipelining
29 pages
Computer Pipelining Explained
No ratings yet
Computer Pipelining Explained
45 pages
Branch Handling 1
No ratings yet
Branch Handling 1
50 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
Canvas Pipelining and Parallel Processors
No ratings yet
Canvas Pipelining and Parallel Processors
5 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
Chapter 5
No ratings yet
Chapter 5
38 pages
Lec-12 Pipelining
No ratings yet
Lec-12 Pipelining
44 pages
CA Unit-2 Chapter-2
No ratings yet
CA Unit-2 Chapter-2
36 pages
Group 17 - 2151177
No ratings yet
Group 17 - 2151177
15 pages
Instruction Pipelining Overview
No ratings yet
Instruction Pipelining Overview
29 pages
L10-L11-Instruction Pipelining
No ratings yet
L10-L11-Instruction Pipelining
38 pages
L05 PipeliningII
No ratings yet
L05 PipeliningII
36 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
SRM Pipelining 05
No ratings yet
SRM Pipelining 05
42 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
Onur Ddca 2025 Lecture15b Branch Prediction Beforelecture
No ratings yet
Onur Ddca 2025 Lecture15b Branch Prediction Beforelecture
188 pages
Basics and Hazards of Pipeline Controller
No ratings yet
Basics and Hazards of Pipeline Controller
23 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Comp206 Lecture9
No ratings yet
Comp206 Lecture9
53 pages
Pipelining. Pipeline Hazards: Sabina Batyrkhanovna
No ratings yet
Pipelining. Pipeline Hazards: Sabina Batyrkhanovna
19 pages
Introduction To Pipelining Introduction To Pipelining
No ratings yet
Introduction To Pipelining Introduction To Pipelining
35 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Chapter 10 Principles of Pipelining
No ratings yet
Chapter 10 Principles of Pipelining
124 pages
Analog Communications Course
100% (1)
Analog Communications Course
110 pages
Biomedical Imaging Techniques Guide
No ratings yet
Biomedical Imaging Techniques Guide
72 pages
WWW - Manaresults.co - In: Applications (Common To ECE, ETM)
No ratings yet
WWW - Manaresults.co - In: Applications (Common To ECE, ETM)
2 pages
X-ray Tube & Generator Basics
No ratings yet
X-ray Tube & Generator Basics
18 pages
Digital Design Through Verilog HDL Course Outcomes For Lab
No ratings yet
Digital Design Through Verilog HDL Course Outcomes For Lab
1 page
An Image Processing Algorithm To Estimate Bone2017
No ratings yet
An Image Processing Algorithm To Estimate Bone2017
4 pages
Verilog Code For Basic Logic Gates
No ratings yet
Verilog Code For Basic Logic Gates
4 pages
Texture Analysis and Fracture Identificat
No ratings yet
Texture Analysis and Fracture Identificat
5 pages
ERTOS Course Outcomes
No ratings yet
ERTOS Course Outcomes
2 pages
Iii-I R09 Dec 2014 Result PDF
No ratings yet
Iii-I R09 Dec 2014 Result PDF
184 pages
Behavioral Verilog Essentials
No ratings yet
Behavioral Verilog Essentials
41 pages
B.Tech Exam Results 2013/2014
No ratings yet
B.Tech Exam Results 2013/2014
162 pages
B.Tech Exam Results 2013/2014
No ratings yet
B.Tech Exam Results 2013/2014
162 pages
DIP and Soft Computing Syllabus
No ratings yet
DIP and Soft Computing Syllabus
5 pages
Verilog HDL Digital Design Course
No ratings yet
Verilog HDL Digital Design Course
1 page
The Look Up Table (LUT)
No ratings yet
The Look Up Table (LUT)
5 pages
Verilog Operators Explained
No ratings yet
Verilog Operators Explained
20 pages
III-I R09 DEC 2014 RESULT (12 Batch)
No ratings yet
III-I R09 DEC 2014 RESULT (12 Batch)
551 pages
IO Ports in 8051
No ratings yet
IO Ports in 8051
8 pages
Verilog HDL for Engineers
No ratings yet
Verilog HDL for Engineers
73 pages
Development of Timing and State Diagrams Refined The Block Diagram Cleaned Up and Added Better Names, Added Detail
No ratings yet
Development of Timing and State Diagrams Refined The Block Diagram Cleaned Up and Added Better Names, Added Detail
21 pages
CH 01
No ratings yet
CH 01
37 pages
Instruction Set of 8086
No ratings yet
Instruction Set of 8086
69 pages
Crusoe CPU: Compact & Efficient
No ratings yet
Crusoe CPU: Compact & Efficient
16 pages
CH04 Solution
No ratings yet
CH04 Solution
24 pages
Computer Architecture Notes - Rana
50% (2)
Computer Architecture Notes - Rana
14 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
94 pages
Arch1 3 (Format)
No ratings yet
Arch1 3 (Format)
26 pages
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
No ratings yet
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
57 pages
VLIW Architectures For DSP: A Two-Part Lecture
No ratings yet
VLIW Architectures For DSP: A Two-Part Lecture
11 pages
CPUID HWMonitor: AMD Athlon II X2 Report
No ratings yet
CPUID HWMonitor: AMD Athlon II X2 Report
20 pages
1.addressing Mode
No ratings yet
1.addressing Mode
2 pages
Coder64 Edition X86 Opcode and Instruction Reference 1.12
No ratings yet
Coder64 Edition X86 Opcode and Instruction Reference 1.12
4 pages
Design of The MIPS Processor
No ratings yet
Design of The MIPS Processor
24 pages
Assembly Jump Instructions Guide
No ratings yet
Assembly Jump Instructions Guide
14 pages
UNIT-2.0 - (Ch-5Morris Mano)
No ratings yet
UNIT-2.0 - (Ch-5Morris Mano)
49 pages
Microprogrammed Control Guide
No ratings yet
Microprogrammed Control Guide
24 pages
Assignment
No ratings yet
Assignment
16 pages
Classic RISC Pipeline
No ratings yet
Classic RISC Pipeline
10 pages
8085 Microprocessor Instructions
No ratings yet
8085 Microprocessor Instructions
3 pages
Instruction Format
No ratings yet
Instruction Format
10 pages
Lect02.LecJan12 2006.PipelineProcessor
No ratings yet
Lect02.LecJan12 2006.PipelineProcessor
34 pages
5 Pipeline
No ratings yet
5 Pipeline
63 pages
CS G524 2006 T1 2011 1
No ratings yet
CS G524 2006 T1 2011 1
4 pages
Courseproject - Computers Assignment Design Compilers .
No ratings yet
Courseproject - Computers Assignment Design Compilers .
6 pages
IT3030E CA Chap5 CPU
No ratings yet
IT3030E CA Chap5 CPU
98 pages
ECE-6913 - RISC-V Project - A1
No ratings yet
ECE-6913 - RISC-V Project - A1
4 pages
Coa Lecture Unit 3 Pipelining
No ratings yet
Coa Lecture Unit 3 Pipelining
95 pages
Unit 2 MCQ
No ratings yet
Unit 2 MCQ
11 pages
Instruction Pipeline 3.1.5
No ratings yet
Instruction Pipeline 3.1.5
7 pages
Assignment Project
No ratings yet
Assignment Project
1 page

CPU Pipelining Concepts

Uploaded by

CPU Pipelining Concepts

Uploaded by

Pipelining Basic Concepts

Partition the function of instruction execution into smaller functions

Instruction Operand Execute

Time to execute n instructions: 3nt

Time to execute n instructions: (2 + n)t

Consider the presence of a branch instruction in the pipeline:

Branch instructions introduce control hazards into the pipeline,

Structural Hazards: arising from resource conflicts when the

Hazards are easily solved by stalling, but stalling reduces pipeline

Structural hazards can be solved by extra hardware resource — a

• Split I/D caches

FIGURE 3.44 A pipeline that supports multiple outstanding FP operations.

RAW: read after write

Note: RAR (read after read) is not a hazard.

• Stall pipeline until data hazard is eliminated

• Loses performance benefits of pipelining

Assume RAW conflict between I3 and I4.

• Efficient operation best supported when operand encoding is simple.

rd rs1 rs2 rd rs1 rs2

• Compiler optimization to schedule instructions as per data hazard

• Organize code into basic blocks (single entry/single exit)

• Schedule instructions to minimize data hazards between adjacent

FIGURE 4.3 The basic structure of a DLX processor with a scoreboard.

Common data bus (CDB)

• Evaluate branches earlier

• Statically predict taken/not-taken & flush when not

• Canceling (or nullifying) branch: compiler prediction/flush

• Dynamic (hardware) prediction

Redefine architecture so branches take effect after n instrs after branch

• Where to get instructions to fill branch delay slot?

Assume I3 is a conditional branch & branch resolved in EX stage.

Predict taken Predict taken

Taken Not taken

Predict not taken Predict not taken

FIGURE 4.13 The states in a two-bit prediction scheme.

2–bit per branch predictors

2–bit global branch history

FIGURE 4.20 A (2,2) branch-prediction buffer uses a two-bit global history to

FIGURE 4.22 A branch-target buffer.

Step-by-step use of Branch Target Buffers

No Entry found in Yes

Enter Mispredicted Branch

FIGURE 4.23 The steps involved in handling an instruction with a branch-target

1. Synchronous versus Asynchronous

You might also like