PIPELINING
By
Dr. Mausumi Maitra
Professor
Dept. of Information Technology
Govt. College of Engg. and Ceramic Technology
What is Pipelining
A technique used in advanced microprocessors
where the microprocessor begins executing a second
instruction before the first has been completed.
- A Pipeline is a series of stages, where some work is
done at each stage. The work is not finished until it
has passed through all stages.
With pipelining, the computer architecture allows the
next instructions to be fetched while the processor is
performing arithmetic operations, holding them in a
buffer close to the processor until each instruction
operation can be performed.
How Pipeline Works
The pipeline is divided into segments and each
segment can execute it’s operation concurrently
with the other segments.
Once a segment completes an operation, it
passes the result to the next segment in the
pipeline and fetches the next operations from
the preceding segment.
Pipeline
A common analogue for a pipeline is a factory
assembly line. Assume that there are three
stages:
1. Welding
2. Painting
3. Polishing
For simplicity, assume that each task takes one
hour.
Characteristics Of Pipelining
If the stages of a pipeline are not balanced and one
stage is slower than another, the entire throughput of
the pipeline is affected.
In terms of a pipeline within a CPU, each instruction is
broken up into different stages. Ideally if each stage
is balanced (all stages are ready to start at the same
time and take an equal amount of time to execute.)
the time taken per instruction (pipelined) is defined
as:
Time per instruction (unpipelined) / Number of stages
Contd..
The previous expression is ideal. We will see later
that there are many ways in which a pipeline
cannot function in a perfectly balanced fashion.
In terms of a CPU, the implementation of
pipelining has the effect of reducing the average
instruction time, therefore reducing the average
CPI.
EX: If each instruction in a microprocessor takes
5 clock cycles (unpipelined) and we have a 4
stage pipeline, the ideal average CPI with the
pipeline will be 1.25 .
Example
Instruction 1 Instruction 2
X X
Instruction 4 Instruction 3
X X
Four sample instructions, executed linearly
5
IF ID EX M W
1
IF ID EX M W
1
IF ID EX M W
1
IF ID EX M W
Four Pipelined Instructions
Instruction Fetch
The instruction Fetch (IF) stage is responsible
for obtaining the requested instruction from
memory. The instruction and the program
counter (which is incremented to the next
instruction) are stored in the IF/ID pipeline
register as temporary storage so that may be
used in the next stage at the start of the next
clock cycle.
Instruction Decode
The Instruction Decode (ID) stage is responsible
for decoding the instruction and sending out the
various control lines to the other parts of the
processor. The instruction is sent to the control
unit where it is decoded and the registers are
fetched from the register file.
Execution
The Execution (EX) stage is where any
calculations are performed. The main
component in this stage is the ALU. The ALU is
made up of arithmetic, logic units.
Memory and IO
The Memory and IO (MEM) stage is responsible
for storing and loading values to and from
memory. It is also responsible for input or
output from the processor.
Write Back
The Write Back (WB) stage is responsible for
writing the result of a calculation, memory
access or input into the register file.
Operation Timings
Estimated timings for Instruction 2ns
each of the stages: Fetch
Instruction 1ns
Decode
Execution 2ns
Memory 2ns
and IO
Write Back 1ns
Advantages/Disadvantages
Advantages:
More efficient use of processor
Quicker time of execution of large number of
instructions
Disadvantages:
Pipelining involves adding hardware to the chip
Inability to continuously run the pipeline
at full speed because of pipeline hazards
which disrupt the smooth execution of the
pipeline.
Instructions in the pipeline stages
Computer Performance
1. Latency – The amount of time that a single
operation takes to execute .
2. Throughput – The rate at which operations
get executed. Generally expressed as
operations/second or operations/cycle
For Normal Processor
Throughput = 1 / Latency
In Pipeline Architecture
Throughput > 1 / Latency
Since instruction execution is overlapped, which is
called Temporal Parallelism. It is appropriate if
1. The jobs to be carried out are identical
2. A job can be divided into many independent tasks
i.e. each task can be done independent of other tasks
3. The time taken for performing each task is the
same.
4. The time taken to transmit a job from one
processing stage to the next is negligible
compared to the time needed to execute a task.
5. The number of tasks into which a job is broken
up is much smaller compared to the number of
jobs to be carried out.
To Find the Speed of Pipeline :
Let us consider a Pipeline system, where
No. of segments in Pipeline = K
Clock Cycle = tp
No. of jobs = n
1st task requires a time = k. tp
Remaining (n – 1) jobs comes out one job per
clock cycle = (n – 1) tp
Total time = {k+(n-1)} tp
Let us consider a system without pipeling :
Time required for each job = tn = k.tp
Total time required for n job = n. tn = n.k.tp
Speed up of pipeline process is defined as the ratio
S = (Time taken without pipeline) / (Time taken with
pipeline)
=n.k.tp / (k+n-1) tp
As the no. of job increases n >> k-1
S = n.k.tp / n.tp ~ k
Therefore, theoretically maximum speed up is k, where k is
the number of stages in the pipeline.
Ex.
Let tp = 20 ns, k = 4 and n = 100 jobs
For pipeline system, time required
(k + n -1) tp = (4 + 99) x 20 ns = 2060 ns
For non pipeline system, time required
n. k. tp = 100 X 4 X 20 ns = 8000 ns.
Speed up ratio S = 8000 / 2060 = 3.88
If n = 1000, what will be the speed up ratio ?
As the number of job increases s -> 4, which is equal to the
number of stages in the pipeline.
Ex. 2
A job consists of 4 tasks. The time taken by the 4 tasks are respectively
20 ns, 10 ns, 15 ns and 20 ns. Pipelining is used to reduce processing
time. If the no. of job entering the pipeline is 120 . Find the efficiency of
pipelining.
Time taken by each job = 65 ns
Without pipelining time taken by 120 jobs = 120 X 65 = 7800 ns.
If pipelining is used all tasks must be allotted equal time which should
be the maximum time for a task = 20 ns.
Therefore, time taken to complete 120 jobs
= 80 + 119 X 20 = 2460 ns.
Therefore, speed up = 7800 / 2460 = 3.17
Ideal speed up = 4 (No. of stages)
Therefore pipeline Efficiency
= Actual speed up / Ideal speed up = 3.17 / 4 = 0.7925
% Efficiency = 0.7925 X 100 % = 79.25 %
Problems in Implementing Pipeline
1. Synchronization – Each stage must take equal
amount of time so that the job can flow
between stages without hold up.
2. Bubbles in Pipeline – If some tasks are absent
in a job “bubbles” form in the pipeline.
3. Fault Tolerance – System does not tolerate faults.
If one of the stages in the pipeline fails for some
reason, the entire pipeline is upset.
4. Intertask Communication – The time to transmit a
job between pipeline stages should be much smaller
compared to the time taken to do a task.
Ex. 3
A 4 stage pipeline adder is to be designed. The time
taken by each of the pipeline stages are 3 ns, 3 ns,
10 ns and 5 ns.
a)What should be the clock frequency to drive the
adder ?
b) What is the time taken to add 100 pairs of
operands ?
c)If 20 operands are zero (at random), what is the
time taken to add 100 pairs of operands ?
d) What is the efficiency of pipeline addition if all the
operands are non-zero ?
e)What is the efficiency in case (c) ?
a) We should design for the slowest option in the pipeline.
Therefore, Clock Frequency = 1 / 10 ns = 100 MHz.
b) Time taken = (k + n-1) tp = (4 + 100 -1) X 10 ns = 1030 ns.
c) Same as (b) as there is no method of detecting zeros in pipelining.
d)Efficiency = Actual Speed up / Ideal Speed up
Now, Ideal Speed up = No. of stages in the pipeline = 4
Actual Speed up =Time taken without pipeline / Time taken with pipeline
= 2100 / 1030 = 2.04
Therefore, Efficiency = 2.04 / 4 = 0.5097
% Efficiency = 51 %
e) In case of ( c ) : Actual speed up = 21 X 80 / 1030 = 1.63
Therefore, Efficiency = 1.63 / 4 = 0.408
% Efficiency = 40.8 %
Thank You